Open eklem opened 2 years ago
Sitemap-listings for southern sami on ndla.no
https://ndla.no/sitemap-urn-subject-1-11c4696f-e844-4c98-8df7-49d43f59ec33.txt https://ndla.no/sitemap-urn-subject-1-a532138d-e16a-4046-a46e-bd5bc9487b8b.txt https://ndla.no/sitemap-urn-subject-1-a5d7da3a-8a19-4a83-9b3f-3c855621df70.txt https://ndla.no/sitemap-urn-subject-1-20e0fdca-5237-4095-a9e5-cea7d63866c0.txt https://ndla.no/sitemap-urn-subject-1-b8a448f0-e251-41ea-af1c-b2fd62a89828.txt https://ndla.no/sitemap-urn-subject-1-d4511941-a1fc-4336-bc80-0a05c534a182.txt https://ndla.no/sitemap-urn-subject-1-962dd49d-72e8-4576-9efb-69d93a95402e.txt https://ndla.no/sitemap-urn-subject-1-f7c5f36a-198d-4c38-a330-2957cf1a8325.txt
Thank you @gunnarvelle ! I'll include the content that has longer text and where I can be pretty sure it's only Southern Sami language.
Check Sami newspapers!
Depend on corpus-sma and corpus-smj
cp -r ./source/*.xml ./destination
External library: corpus-smj-sma-json
. Will be new dependency.
If I get ahold of more text or a site I can crawl for any of the other languages than North-, Lule- and South Sami, I'll create a stopword list for those too.
And doesn't matter if the language is not spoken in Norway.