dpriskorn / odsc

Project that aims to sentenize all the open data of Riksdagen and other sources to create an easily linkable dataset of sentences that can be refered to from Wikidata lexemes and other resources
GNU General Public License v3.0
0 stars 0 forks source link

Support ~500k Folketinget documents custom cc-by like license #21

Open dpriskorn opened 8 months ago

dpriskorn commented 8 months ago

WIP models for downloading the necessary files are here https://github.com/dpriskorn/riksdagen_sentences/tree/main/models/providers Estimated tokens: 1G