eklem / stopword-sami

Sami stopword lists for natural language processing. Examples on use could be search engines, machine learning and chatbots.
MIT License
1 stars 0 forks source link

Northern Sámi #1

Closed eklem closed 2 years ago

eklem commented 3 years ago

Here's the list of all pages.

The Wikipedia-site has over 7700 articles so it could be enough to generate a good list of stopwords. Seems a lot of the articles are very short, so we'll see how it goes.

Could also be a nice start for a Northern Sami search engine?

eklem commented 3 years ago

Tool to check meaning of Northern Sami text: http://jorgal.uit.no/index.sme.html?dir=sme-nob#translation

eklem commented 3 years ago

Compare frequency with Northern Saami frequency list here: http://giellatekno.uit.no/lex.en.html

eklem commented 2 years ago

Too many issues with using Wikipedia, so swapping to crawling NRK Sapmi: https://www.nrk.no/sapmi/samegillii/