Closed albbas closed 5 years ago
Date: 2019-05-20 15:58:44 +0200
From: Børre Gaup <
sme $ echo " olbmui. Eanas oassi." | divvun-checker -l se -n smegram {"errs":[["Eanas oassi",10,21,"double-space-before","Leat guokte gaskka ovdal \" oassi\"",["Eanas oassi"],"Sátnegaskameattáhusat"]],"text":" olbmui. Eanas oassi."}
If oassi is replaced with guossi, or Eanáš with Eanas the correct suggestion is given:
sme $ echo " olbmui. Eanas guossit." | divvun-checker -l se -n smegram {"errs":[[". Eanas",7,15,"double-space-before","Leat guokte gaskka ovdal \"Eanas\"",[". Eanas"],"Sátnegaskameattáhusat"]],"text":" olbmui. Eanas guossit."}
sme $ echo " olbmui. Eanáš oassi." | divvun-checker -l se -n smegram {"errs":[[". Eanáš",7,15,"double-space-before","Leat guokte gaskka ovdal \"Eanáš\"",[". Eanáš"],"Sátnegaskameattáhusat"]],"text":" olbmui. Eanáš oassi."}
sme $ echo eanas | husmeNorm eanas eanas+Adv 0,000000 eanas eanas+Pron+Indef+Sg+Nom 0,000000 eanas eanas+A+Attr 0,000000
sme $ echo eanáš | husmeNorm eanáš eanášit+V+TV+Imprt+ConNeg 0,000000 eanáš eanášit+V+TV+Imprt+Sg2 0,000000 eanáš eanášit+V+TV+Ind+Prs+ConNeg 0,000000 eanáš eanáš+Adv 0,000000
Date: 2019-08-17 20:06:08 +0200
From: Linda Wiechetek <
What exactly is the problem? I can't see a difference in the headline. Could you check if the problem still exists?
Date: 2019-08-18 15:20:03 +0200
From: Linda Wiechetek <
Now I see the problem. I sent you an email about it. The difference between Eanas oassi and Eanáš oassi is that the first one is listed as a one word compound. I'm not sure how that influences the matter.
Date: 2019-08-20 14:38:08 +0200
From: Sjur Nørstebø Moshagen <
The underlying problem is that the whitespace analyser is applied directly after the morphological analysis & tokenisation, which means that the tag
Date: 2019-08-21 09:43:51 +0200 From: Kevin Brubeck Unhammer <<unhammer+apertium>>
Whitespace-analyser kan gi taggane
Date: 2019-08-29 00:08:11 +0200
From: Linda Wiechetek <
Well, right now we mess up anyway.. I tested "Dat lea eanet go 10. Dat lea eanet go 10. olbmui."
and we get:
"
"It is more than 10" should give us "." CLB.. I'll have a look at a possible rule.
Date: 2019-08-29 00:11:28 +0200
From: Linda Wiechetek <
Ahh.. linjeshift... Altså vi klarer å disambiguere i denne setninga uten å referera til
Date: 2019-09-02 14:16:37 +0200
From: Sjur Nørstebø Moshagen <
Eg flyttar blankteiknsanalysatoren til lenger ut i kommandorekka. Linda sin regel er ikkje lenger avhengig av denne taggen.
Date: 2019-09-06 20:52:34 +0200
From: Sjur Nørstebø Moshagen <
(In reply to Kevin Brubeck Unhammer from comment #4)
Alternativt er det ikkje noko problem å ha to whitespace-analysers køyrande, éin som legg på meir «informative» taggar som
(og køyrer før mwe-dis.cg3), og éin som legg på feiltaggar som (etter cg-mwesplit).
Eg valde å gjera det på denne måten, og no funkar ting som dei skal:
$ echo " olbmui. Eanas oassi." | divvun-checker -a se.zcheck | jq . { "errs": [ [ ". Eanas", 7, 15, "double-space-before", "Leat guokte gaskka ovdal \"Eanas\"", [ ". Eanas" ], "Sátnegaskameattáhusat" ] ], "text": " olbmui. Eanas oassi." }
Eg avsluttar lusmeldinga.
This issue was created automatically with bugzilla2github
Bugzilla Bug 2585
Date: 2019-05-20T15:58:44+02:00 From: Børre Gaup <>
To: Linda Wiechetek <>
CC: linda.wiechetek, sjur.n.moshagen, thomas.omma, trond.trosterud, unhammer+apertium
Last updated: 2019-09-06T20:52:34+02:00