Closed eklem closed 2 years ago
https://www.nrk.no/sapmi/samegillii/
So, need to create a crawler to get the content from these three pages. Try click a 2000 times on vis flere
and then get the content of the page. There is 5 article stubs for each click.
Check if Playwright is the right tool.
From version 0.0.3 of nrk-sapmi-crawler
I can fetch JSON files with article IDs. Set it up for South Sami, Lulesami and North Sami. Do a re-crawl every now and then. Let the data gathering begin 😄
https://se.wikipedia.org/wiki/Erenoam%C3%A1%C5%A1:Buot_siiddut?from=ADA_universitehta&to=&namespace=0 seems to contain too many stubs, so it's maybe not so good.
This means we need some other text sources for our datasets.
https://www.nrk.no/sapmi could maybe be a good one, I just need to understand what three Sami languages that are represented.