issues
search
dpriskorn
/
odsc
Project that aims to sentenize all the open data of Riksdagen and other sources to create an easily linkable dataset of sentences that can be refered to from Wikidata lexemes and other resources
GNU General Public License v3.0
0
stars
0
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
chore(deps): Bump setuptools from 69.0.2 to 70.0.0
#53
dependabot[bot]
opened
2 months ago
0
chore(deps): Bump zipp from 3.17.0 to 3.19.1
#52
dependabot[bot]
opened
2 months ago
0
chore(deps): Bump certifi from 2023.11.17 to 2024.7.4
#51
dependabot[bot]
opened
2 months ago
0
chore(deps): Bump scikit-learn from 1.3.2 to 1.5.0
#50
dependabot[bot]
opened
3 months ago
0
chore(deps): Bump urllib3 from 2.1.0 to 2.2.2
#49
dependabot[bot]
opened
3 months ago
0
chore(deps): Bump pymysql from 1.1.0 to 1.1.1
#48
dependabot[bot]
opened
3 months ago
0
chore(deps): Bump requests from 2.31.0 to 2.32.0
#47
dependabot[bot]
opened
3 months ago
0
chore(deps): Bump jinja2 from 3.1.2 to 3.1.4
#46
dependabot[bot]
opened
4 months ago
0
chore(deps): Bump tqdm from 4.66.1 to 4.66.3
#45
dependabot[bot]
closed
4 months ago
0
chore(deps): Bump idna from 3.6 to 3.7
#44
dependabot[bot]
opened
5 months ago
0
chore(deps): Bump pillow from 10.2.0 to 10.3.0
#43
dependabot[bot]
opened
5 months ago
0
chore(deps-dev): Bump black from 23.12.0 to 24.3.0
#42
dependabot[bot]
opened
6 months ago
0
chore(deps): Bump cryptography from 42.0.2 to 42.0.4
#41
dependabot[bot]
closed
6 months ago
0
chore(deps): Bump cryptography from 42.0.0 to 42.0.2
#40
dependabot[bot]
closed
7 months ago
0
chore(deps): Bump cryptography from 41.0.7 to 42.0.0
#39
dependabot[bot]
closed
7 months ago
0
chore(deps): Bump fastapi from 0.106.0 to 0.109.1
#38
dependabot[bot]
closed
7 months ago
0
chore(deps): Bump pillow from 10.1.0 to 10.2.0
#37
dependabot[bot]
closed
7 months ago
0
Use only riksdagen html and clean
#36
dpriskorn
opened
8 months ago
0
chore(deps): Bump jinja2 from 3.1.2 to 3.1.3
#35
dependabot[bot]
closed
4 months ago
1
Support all Runeberg CC0 books also
#34
dpriskorn
opened
8 months ago
1
Use Metabase to visualize the data
#33
dpriskorn
opened
8 months ago
0
Upload all sentences and raw tokens to a Wikibase to enable linking and annotation
#32
dpriskorn
opened
8 months ago
0
Store boolean on datasets whether cc0
#31
dpriskorn
opened
8 months ago
0
Fetch html instead of pdf from folketinget
#30
dpriskorn
opened
8 months ago
0
Count all accepted tokens per sentence and store it
#29
dpriskorn
opened
8 months ago
0
Support storing year per document
#28
dpriskorn
opened
8 months ago
0
Add litteraturbanken 344M token cc-by 4
#27
dpriskorn
opened
8 months ago
0
Add clarin is parlamint with 1,2G tokens cc-by 4
#26
dpriskorn
opened
8 months ago
0
Strip entities before insertion
#25
dpriskorn
opened
8 months ago
0
Guard against mismatch between token language id and sentence language id
#24
dpriskorn
opened
8 months ago
0
Document the evolvable API and help consumers
#23
dpriskorn
opened
8 months ago
0
Move Riksdagen specific code into /providers
#22
dpriskorn
opened
8 months ago
0
Support ~500k Folketinget documents custom cc-by like license
#21
dpriskorn
opened
8 months ago
0
Add evolvable /lookup API endpoint
#20
dpriskorn
closed
8 months ago
0
Add new endpoint /sv/usage_example/search/$1
#18
dpriskorn
closed
8 months ago
0
Store information about license on datasets
#17
dpriskorn
opened
8 months ago
0
Support marking datasets as user-generated content
#16
dpriskorn
opened
8 months ago
0
Create a fastapi /lookup for Luthor and other data consumers
#15
dpriskorn
closed
8 months ago
0
Enable distinction of written or oral source per dataset
#14
dpriskorn
opened
8 months ago
0
Store unique NER entities per sentence
#13
dpriskorn
closed
8 months ago
0
Support storing NER entities also
#12
dpriskorn
closed
8 months ago
0
Use the faster sentencex instead of spacy for sentenizing
#11
dpriskorn
opened
8 months ago
0
Discard tokens with unaccepted chars
#10
dpriskorn
closed
8 months ago
0
More cleaning of tokens is needed
#9
dpriskorn
closed
8 months ago
0
Markup hyphenated tokens
#8
dpriskorn
opened
8 months ago
0
Refactor database logic into CRUD classes
#7
dpriskorn
closed
8 months ago
0
Save to sqlite database
#6
dpriskorn
closed
9 months ago
0
Support entity linking for each sentence
#4
dpriskorn
opened
9 months ago
1
Switch to using swedish-spacy-pipeline
#3
dpriskorn
closed
9 months ago
1
Switch to fasttext-langdetect
#2
dpriskorn
closed
9 months ago
0
Next