divvun / divvun-gramcheck-web

Grammar checker for web word processors, targeted at minority and indigenous languages, but open for everyone.
GNU General Public License v3.0
1 stars 0 forks source link

Whitespace after wrong quote mark is swallowed #38

Closed albbas closed 2 years ago

albbas commented 3 years ago

don" mii is corrected to don”mii Tested in Google Docs/Firefox/Linux

unhammer commented 3 years ago

what pipe is this, se-smegramrelease?

snomos commented 3 years ago

It is the default one, which should be smegramrelease.

unhammer commented 3 years ago
$ echo 'don" mii' | divvun-checker -l se
{"errs":[["\" mii",3,8,"punct-aistton-left","Boasttuaisttonmearkkat",["”mii"],"Aisttonmearkkat"]],"text":"don\" mii"}

$ echo 'don" mii' |bash smegramrelease.sh 
"<don>"
        "don" Pron Sem/Hum Pers Sg2 Nom <W:0.0> <firstCohort> @HNOUN
"<">"
        """ PUNCT <W:0.0> &punct-aistton-left ID:2 R:RIGHT:3
punct-aistton-left
        """ PUNCT <W:0.0> "”mii"S &punct-aistton-left &SUGGESTWF ID:2 R:RIGHT:3
punct-aistton-left
        "”" PUNCT RIGHT Err/Orth <W:0.0> ID:2 R:RIGHT:3
: 
"<mii>"
        "mii" Pron Indef Sg Nom <W:0.0> <LastCohort> &LINK &punct-aistton-left ID:3
punct-aistton-left
        "mii" Pron Rel Sg Nom <W:0.0> <LastCohort> &LINK &punct-aistton-left ID:3
punct-aistton-left
        "mun" Pron Sem/Hum Pers Pl1 Nom <W:0.0> <LastCohort> &LINK &punct-aistton-left ID:3
punct-aistton-left
:\n

The rule that adds the suggestion

# Generer forslag for hermeteikn på _venstre_ side:
COPY:punct-aistton-left-sugg KEEPORDER (VSTR:"”$1"S &SUGGESTWF) TARGET (&punct-aistton-left) IF

https://github.com/giellalt/lang-sme/blob/6840e25893ae7d14ae76e545aa2cf01ea532c85d/tools/grammarcheckers/grammarchecker-release.cg3#L9232-L9233

puts the suggestion on (the left of) mii since mii got the &punct-aistton-left tag.

So why didn't don" get &punct-aistton-right instead? There is no such rule – the rules either assume we have wrong aisttons on both sides of the word, or on the left (only "»" can get &punct-aistton-right).

The reason is probably that we don't in CG know where the space is.


This issue could be solved by

  1. adding a tag to the PUNCT cohort with blanktagger (presumably analyser-gt-whitespace.regex or analyser-gt-errorwhitespace.regex ). For example, add <spaceAfterAistton> if there's a space after a ".
  2. changing the rules so we add &punct-aisston-left to " if there is NOT <spaceAfterAistton>, while we add &punct-aisston-right to " if <spaceAfterAistton> is there.
snomos commented 2 years ago

This works now in both MS Word and GDocs.