Closed snomos closed 5 years ago
[ ?* {"<,>"} ]:[ "<NoSpaceAfterPunctMark>"]
should only match commas?
But I see the issue. This may require a change to divvun-blanktag
itself, perhaps a special symbol for end-of-stream (not just end-of-paragraph – that would give a :\n
or :</p>
or similar).
This is also true for the beginning of the stream, cf the following:
echo 'ja (Lauvås/Handal, s. 159)' | tools/grammarcheckers/modes/smegramrelease.mode
"<ja>"
"ja" CC <W:0.0> @CVP
:
"<(>"
"(" PUNCT LEFT <W:0.0>
"<Lauvås>"
"Lauvås" N <NomGenSg> Prop Sem/Sur Sg Nom <W:0.0> @HNOUN
"</>"
"/" PUNCT <W:0.0>
"<Handal>"
"Handal" N <NomGenSg> Prop Sem/Sur Sg Nom <W:0.0> @<SPRED
"<,>"
"," CLB <W:0.0>
:
"<s.>"
"s" N <NomGenSg> Sem/Sign ABBR Gram/TAbbr Sg Nom <W:0.0> @HNOUN
:
"<159>"
"159" Num Arab Sg Nom <W:0.0> @N<
"<)>"
")" PUNCT RIGHT <W:0.0> <LastCohortOfParagraph>
:\n
Expected: the first cohort should have had the tag <firstWordOfParagraph>
, cf the following regex in analyser-gt-whitespace.regex
:
[ {\n} ?* {"<} ?* {>"} ?* ]:[ "<firstWordOfParagraph>" ]
The following command:
gives the following JSON output:
Note the error message for the final full stop. The regex that triggers this error is:
in the file
sme/tools/grammarcheckers/analyser-gt-whitespace.regex
. The regex works fine in all other cases. How can we avoid that it matches end-of-paragraph full stops?.