giellalt / lang-smj

Finite state and Constraint Grammar based analysers and proofing tools + language resources for Lule Sámi
GNU General Public License v3.0
2 stars 0 forks source link

mobiltelefåvnåjnk ( #72

Closed albbas closed 17 years ago

albbas commented 17 years ago

This issue was created automatically with bugzilla2github

Bugzilla Bug 393

Date: 2007-05-12T14:15:13+02:00 From: Trond Trosterud <> To: Thomas Omma <> CC: per-eric.kuoljok, sjur.n.moshagen

Last updated: 2007-05-18T09:07:14+02:00

albbas commented 17 years ago

Comment 1369

Date: 2007-05-12 14:15:13 +0200 From: Trond Trosterud <>

The above form is suggested by the speller. We now have the following clitic lexicon for smj: -ge, -gen, -ga, -k. According to Sámásta, the clitic -k is added to question adverbs ikle aktak, goassak, gåk, gåsik, majdik. I suspect that -k cannot be added to mobiltelefåvnåjn, and if smj act like sme in Norway, I expect the particles to be written separately, cf. sidá.’ Johannes bådij ja sån ittjij bårå, ittjij ga jugá, ja de javlli: ’Sujna l a guhti muv maŋŋela boahtá le mujsta gievrap, iv ga le árvvogis sujsta gábmagijt av skirtov, allit ga gábmagijt ja soappev. Dajna gå bargge biebmos ánssit. Åtsåd ádna mulldo. Ja asidis muldon ruvva ihtin. Valla gå biejvve badjánij, de buollin dájna vierregijn ja sijá duobbmon sjaddá, dajna gå buorádusáv Jona sárnnedattij

Going throuhg our smj corpus, I found ONE form with final -k and a no analysis without a Clt tag: "bágok báhko+N+Pl+Nom+Clt+k". Another, Grehkagielak, seems to be an error (should be A Attr). For Josefnamák I have no explanations.

In any case: the clitic lexicon should be overlooked, the clitic -k critically revised, perhaps lexicalised, or at least banned from the suggestion list of the speller

albbas commented 17 years ago

Comment 1370

Date: 2007-05-12 17:05:24 +0200 From: Sjur Nørstebø Moshagen <>

This is exactly the issue discussed in the last meeting (or was it the previous), which Thomas and Per-Eric is investigating at the moment. Please see the last meeting memo (or the previous one).

But yes, the present behaviour is clearly wrong.

One alternative analysis suggested by Duomma/Per-Eric is that the -k clitic is only possible after vowel-final word forms. In any case, the issue is already under investigation:)

albbas commented 17 years ago

Comment 1371

Date: 2007-05-12 18:52:32 +0200 From: Trond Trosterud <>

Yes, I really forgot. One thing is to discuss it, another is to see its grim consequences. So, yes, this is not the smj version to make it to the final beta.

In our Lule Sámi corpus, these forms are the ONLY one with Ck$: Jack, Solbakk, dávk, vájk.

The following list contain all words ending in -k that get an analysis also if the k is chopped off:

Aktak, Avtak, Buorremielak, Dåbdåk, Erik, Giehppismielak, Grehkagielak, Gávak, Henrik, Jalik, Josefnamák, Jábbmek, Jáhkák, Mak, Migek, Profehtak, Rievtesmielak, Simonnamák, Stuorak, Suvrrodik, Sámegielak, Vallak, Vállduk, aktak, alek, allak, allamáttok, almmuk, avtak, bahámielak, bargak, basák, biejvvek, birratjuohpadahtek, buorrek, bádek, bágok, bälostahtek, båråk, daguk, dagák, dajnak, dajvak, dasik, diedek, duohppik, duohtak, dárogielak, dávk, dåbdåk, dåk, dåssjånik, ednak, gejnak, gevgak, geŋgak, giehppismielak, goabbák, goappák, goassak, grehkagielak, gudik, guhkak, guhtik, gulldalik, gájgodik, gässtak, gåbtjåk, gåk, gåktuk, gåsik, hebreagielak, hebreak, häsok, ieredik, jalik, jaskadahtek, julevsámegielak, juorrulahtek, juorrulik, jábbmek, jávrrek, lik, mahkkak, mak, makkirak, masik, masstak, mavgak, miehttik, migek, murkástalák, mávsek, nierbak, njimmurik, njuorak, nåvtik, oattjok, profehtak, rievtesmielak, rijkak, suvrrodik, sámegielak, tjuoldek, tjuvdek, tjádjánik, uddnik, vahágahtek, vaják, vajálduhtek, vallak, vasjodahtek, vehik, vierrek, vuobddatjoallek, vuojnek, vuosteldahtek, vájbak, vájk, vállduk, ájádalák, ánssidahtek, åbbåk, åhpadak, ållik, åtsådalák.

albbas commented 17 years ago

Comment 1372

Date: 2007-05-14 09:07:36 +0200 From: Thomas Omma <>

as Sjur says my hyphothesis is/was: clitic -k after wovel, otherwise clitic -ge/-ga/-gen.

Tronds findings seems to contradict this. It seems like the behavior is much more restricted. Best thing is to comment them out from LEXICON K until we know more.

Are there any findings regarding the clitics -ge/-ga/-gen?

Grehkagielak, Josefnamak, Buorremielak etc are the derivations we have in LEXICON NAMÁK.

albbas commented 17 years ago

Comment 1373

Date: 2007-05-14 10:06:15 +0200 From: Trond Trosterud <>

These are the words in -ga who get an analysis also when the -ga is chopped off (note! there is much noise here, due to the SUFFIX -ga, it could well be that none of these are actually clitic -ga).

Atenaga, Berga, Dalága, Divtasvuonaga, Galga, Galileaga, Ganugijga, Hebreaga, Israelaga, Jábbmega, Kretaga, Lijga, Lávkkijga, Manájga, Måskega, Pelega, Stáhtadoarjjaga, Suolluga, Tjuorvojga, Tjuovvolijga, Viettjajga, Vuolgijga, Vásstedijga, allamáttoga, alodijga, alvaduvájga, badjánijga, bahájuonaga, biejajga, bihkusijga, bilkkedijga, báhtarijga, bátsijga, bådijga, bårjåstijga, dagájga, dajga, dalága, davga, dievdeduvájga, doalvojga, duodastijga, duorastaga, dåbddåjga, gajkedijga, galga, galgajga, ganugijga, gatjádijga, gehtjajga, gevga, giehtojga, giessalijga, gievrrodijga, goavgedijga, gullujga, gullájga, gulájga, guodijga, gáhtjadijga, gárvedijga, gávnajga, gåtsadijga, hiejtijga, hålajga, iejvvidijga, jaskadijga, javlajga, juoga, jábbmega, jálusmuhtijga, jåvsådijga, lijga, lulujga, luojojga, luojttádijga, lájddijga, lávkkijga, lávlojga, låssåga, majga, manájga, mavga, máhtsajga, nagájga, oattjojga, oavdduhijga, rabásmielaga, riegádijga, ruossinávlliduvájga, rádnastallagådijga, rájajga, råhkådalájga, sidájga, sirájga, sjattajga, sjavnjestijga, subtsastijga, suolluga, ságastalájga, ságastijga, sárnnedijga, tjiegadijga, tjoahkkijga, tjuorvojga, tjuorvvijga, tjuorvvogådijga, tjuottjojga, tjuovvolijga, tjáŋajga, uddniga, vattijga, viegajga, viehkalijga, viehkedijga, vierrega, viesojga, világa, vuobdijga, vuojga, vuojnijga, vuojnnegådijga, vuolgijga, vuollánijga, vuosteldijga, vádtsijga, vádtsájga, vájvástuvájga, váldijga, válljijga, vásijga, vásstedijga, Ájttega, ájtsajga, ájttega, árvvedijga, äjvvalijga, åhpadijga, ålggorijkaga, årojga,

albbas commented 17 years ago

Comment 1374

Date: 2007-05-14 10:10:58 +0200 From: Trond Trosterud <>

For -ge, there were NO words which got an analysis without -ge. For -gen, there were two: Pergen, duogen.

Again, since the north sámi clitics are written together with its host in Finland only, I am sceptical to this whole K business. If the -ga list reveals suffix -ga only, we should probably remove the whole K.

albbas commented 17 years ago

Comment 1375

Date: 2007-05-14 10:25:00 +0200 From: Thomas Omma <>

yes, I am with you, there are only a few pronouns here that does not get an analyze without the clitic: davga dat+Pron+Dem+Sg+Acc+Clt+ga dajga dat+Pron+Dem+Pl+Com+Clt+ga dajga dat+Pron+Dem+Pl+Gen+Clt+ga

albbas commented 17 years ago

Comment 1376

Date: 2007-05-14 12:55:17 +0200 From: Trond Trosterud <>

there are only a few pronouns here that does not get an analyze without the clitic: ... and they should of course be lexicalised. So as a first step we may go LEXICON K


and then eventually remove the whole thing. Also, tell tomi or börre, if they do the clitic as compounding.

albbas commented 17 years ago

Comment 1389

Date: 2007-05-18 09:07:14 +0200 From: Thomas Omma <>

this seems to be solved