Closed albbas closed 11 years ago
Date: 2012-09-28 08:49:21 +0200
From: Linda Wiechetek <
When going through the syntactically analyzed corpus from 1.6.2012, I came across several instaces of the following:
"<juoidá>"
"juoga" Pron Indef Sg Acc @<OBJ
"
Date: 2012-12-11 18:31:17 +0100
From: Sjur Nørstebø Moshagen <
This is a bug, not an enhancement. Børre, could you try to have a look at this in between?
Date: 2012-12-11 18:32:08 +0100
From: Trond Trosterud <
It is still with us:
analysed$grep vejolaö 2012-06-01/sme.txt|wc -l 667 analysed$grep vejolaö 2012-11-30/sme.txt|wc -l 657
Date: 2012-12-11 18:47:47 +0100
From: Trond Trosterud <
... but it is restricted to the divvun server, where it is very common :-(
grep '[ aeoiu]ö' 2012-11-30/sme*ccat.txt|wc -l 1102
Strange enough, the problem increases x 14 when we do a dependency analysis :-/
grep '[ aeoiu]ö' 2012-11-30/sme*.dep.txt|wc -l 12948
As already mentioned, it is found on the divvun server, not outside of it:
divvun:
analysed$grep '[aeoiu ]ö' 2012-01-02/sme*.txt|kwic-snt 'ö'
alaö sámekonvenöuvdna Suoma-Norgga-Ruoŧa-Sámi áööedovdi joavkku álgohápmi Geigej alaö sámekonvenöuvdna Suoma-Norgga-Ruoŧa-Sámi áööedovdi joavkku álgohápmi Nammad enöuvdnamearrádusaid ekonomalaö váikkuhusaid. Áööedovdijoavkku lea ofelaötán dat ijoavku eaktuda ahte konvenöuvnna álgohámi ja áööedovdijoavkku árvalussii gullev rraláganat luonddu dáfus ja leat siskkáldasat áööedovdijoavkkus leamaö dárkilis ja artihkkal 42 Boazodoallu sámi ealáhussan. Áööedovdijoavku eaktuda ahte konve de oppalaö hápmái stuorra sárgosiid dáfus, de áööedovdijoavku lea gávnnahan vejoš
freecorpus on my mac:
ccat -r admin/ | grep " Suoma-Norgga-Ruoŧa-Sámi á" Henriksen, Scheinin, Åhrén: Sámi álbmoga iešmearrideami vuoigatvuohta, s. 346-347, i Davviriikkalaš sámekonvenšuvdna: Suoma-Norgga-Ruoŧa-Sámi áššidovdi joavkku álgohápmi, geigejuvvon golggotmánu 26. b. 2005. Oslo 2005 ¶ Davviriikkalaš sámekonvenšuvdna, s. 137. Suoma-Norgga-Ruoŧa-Sámi áššedovdi joavkku álgohápmi. Geigejuvvui golggotmánu 26. b. 2005. ¶
The net sum of this is that we have an unreliable syntax testbed due to an error we do not understand.
Date: 2013-05-06 10:33:40 +0200
From: Børre Gaup <
In the most recent analysed directory, 2013-04-11, grep '[ aeoiu]ö' sme*.dep|wc -l gives 423 hits.
These are the valid hits: sme-nob-admin.dep:"<fltnodatekonomalaööat>" sme-nob-admin.dep: "fltnodatekonomalaööat" ? @X #11->11 sme-nob-admin.dep:"<muorjeöoaggin>" sme-nob-admin.dep: "muorjeöoaggin" ? @X #4->4 sme-nob-admin.dep:"<Muorraöuollan>" sme-nob-admin.dep: "Muorraöuollan" ? @X #1->1 sme-nob-admin.dep:"<guolleöoliiguin>" sme-nob-admin.dep: "guolleöoliiguin" ? @X #9->9
The rest are either propernouns like Päiviö or South Sámi text.
grep vejolaö *.ccat gives zero hits.
This issue was created automatically with bugzilla2github
Bugzilla Bug 1444
Date: 2012-09-28T08:49:21+02:00 From: Linda Wiechetek <>
To: Børre Gaup <>
CC: ciprian.gerstenberger, lene.antonsen, sjur.n.moshagen, trond.trosterud
Last updated: 2013-05-06T10:33:40+02:00