Soft hyphens in text break grammar API

snomos commented 4 years ago

I am not able to get GramDivvun to work in MS Word (the local app) when using the UiT account. It can be installed, it loads, and looks the way it should in the initial screen. But when clicking "Check", it almost immediately dies with the following errors in the console:

[Error] Failed to load resource: the server responded with a status of 500 () (se, line 0)
[Error] Failed to get grammar check API response
TypeError: undefined is not an object (evaluating 'e.filter')
p — index.ts:165
(anonym funksjon) — api.ts:58
(anonym funksjon) — app.0990553d7b12bc912a5b.js:21:37456
s — app.0990553d7b12bc912a5b.js:21:36328
u — bluebird.js:5370
(anonym funksjon) — bluebird.js:3366
(anonym funksjon) — bluebird.js:3423
(anonym funksjon) — bluebird.js:3468
(anonym funksjon) — bluebird.js:3548
l — bluebird.js:145
c — bluebird.js:138
(anonym funksjon) — bluebird.js:154
(anonym funksjon) — bluebird.js:67
(anonym funksjon) — bluebird.js:4620
    (anonym funksjon) (app.0990553d7b12bc912a5b.js:21:38425)
    (anonym funksjon) (app.0990553d7b12bc912a5b.js:21:37456)
    s (app.0990553d7b12bc912a5b.js:21:36328)
    u (app.0990553d7b12bc912a5b.js:1:200125)
    (anonym funksjon) (app.0990553d7b12bc912a5b.js:1:173146)
    (anonym funksjon) (app.0990553d7b12bc912a5b.js:1:173967)
    (anonym funksjon) (app.0990553d7b12bc912a5b.js:1:174655)
    (anonym funksjon) (app.0990553d7b12bc912a5b.js:1:176008)
    l (app.0990553d7b12bc912a5b.js:1:127337)
    c (app.0990553d7b12bc912a5b.js:1:127262)
    (anonym funksjon) (app.0990553d7b12bc912a5b.js:1:128376)
    (anonym funksjon) (app.0990553d7b12bc912a5b.js:1:127207)
    (anonym funksjon) (app.0990553d7b12bc912a5b.js:1:189711)
[Error] Unhandled Promise Rejection: TypeError: undefined is not an object (evaluating 'm.message')
    (anonym funksjon) (word-mac-16.00.js:26:310515)
    promiseReactionJob

Screenshot of the same:

Bilde 18 09 2020 klokken 10 27

I have tested various setups, and most work, but not this one. The ones I have tested are:

windows, app (365), UiT account - works
windows, browser (Edge), UiT account - works
windows, browser (Edge), private account - works
mac, browser (Chrome), private account - works
mac, browser (Chrome), UiT account - works
mac, browser (Safari), private account - DOES work
mac, browser (Safari), UiT account - does NOT work
mac, app (2016), private account - DOES work
mac, app (2016), UiT account - does NOT work

The Safari+UiT problem manifests differently, and thus seems to be a different issue, and can be easily worked around by using another browser. So this bug report targets the Office 2016 locally installed app issue only.

snomos commented 4 years ago

Turns out the problem is soft hyphens in the text sent from Word. So the categorisation above is probably invalid, and just a happy coincidence of the test data used.

To trigger the error, use the following text - it should contain two soft hyphens:

 Áigot nannet sámiid konsultašuvdnarievtti

snomos commented 3 years ago

Seems the error is in libdivvun - @unhammer could you have a look?

bbqsrc commented 3 years ago

The issue this time was the character \x1f was found.

unhammer commented 3 years ago

The issue this time was the character \x1f was found.

INFORMATION SEPARATOR ONE?

bbqsrc commented 3 years ago

Everyone's favourite codepoint! The input was dutkama ja luonddu\x1fdiehtaga,

unhammer commented 3 years ago

What is libdivvun doing wrong? I get

$ echo ' Áigot nannet sámiid konsultašuvdnarievtti' | src/divvun-checker -l se 
{"errs":[["konsulta",21,29,"typo","Ii leat sátnelisttus",["konsula"],"Čállinmeattáhus"],["šuvdna",30,36,"typo","Ii leat sátnelisttus",["šuvona","govdna"],"Čállinmeattáhus"]],"text":" Áigot nannet sámiid konsultašuvdnarievtti"}

$ echo ' Áigot nannet sámiid konsultašuvdnarievtti' | src/divvun-checker -l se |hl-nonprinting 
{"errs":[["konsulta",21,29,"typo","Ii leat sátnelisttus",["konsula"],"Čállinmeattáhus"],["šuvdna",30,36,"typo","Ii leat sátnelisttus",["šuvona","govdna"],"Čállinmeattáhus"]],"text":" Áigot nannet sámiid konsulta-šuvdna-rievtti"}⁋

from the command line with newest giella-sme-speller (that hl-nonprinting is just a script to sed \xad into a dash and EOL into ⁋).

bbqsrc commented 3 years ago

@snomos has conflated two issues. the \x1f issue seems to be coming from libdivvun, whereas the soft hyphen issue is our problem.

unhammer commented 3 years ago

$ printf ' dutkama ja luonddu\x1fdiehtaga' | src/divvun-checker -l se
{"errs":[],"text":" dutkama ja luonddudiehtaga"}

$ printf ' dutkama ja luonddu\x1fdiehtaga' | src/divvun-checker -l se |hl-nonprinting
{"errs":[],"text":" dutkama ja luonddu^_diehtaga"}⁋

– should we be removing it from input, or are we somehow introducing \x1f's, or am I not reproducing the issue correctly here? (Some "expected this, but got that" examples would be nice ;))

bbqsrc commented 3 years ago

Sorry I am trying to go on vacation, haha.

Error was control character (\\u0000-\\u001F) found while parsing a string, looks like it was coming from libdivvun, perhaps it isn't. January's problem now.

unhammer commented 3 years ago

There was a bug in libdivvun – that character should've been escaped according to the json spec. Fixed now, hopefully might help with this issue.

snomos commented 2 years ago

Everyone's favourite codepoint! The input was dutkama ja luonddu\x1fdiehtaga,

This seems to be fixed, at least not causing any trouble in neither Word nor GDocs. Even

konsultašuvdnarievtti

(containing two soft hyphens) seems to be fixed. Closing.

divvun / divvun-gramcheck-web

Soft hyphens in text break grammar API #23