Open ilovan opened 4 years ago
This issue seems to be caused by NERVE not handling unicode characters properly.
Trying running NERVE on this document for an example https://dev-cwrc-writer.cwrc.ca/?githubPath=Salomania&githubRepo=ajmacdonald%2Fgitwriter
In the NERVE response “Viral”
gets converted into ¬タワViral¬タン
. This is what's causing the offset issue.
That's unfortunate and it seems like a newly introduced problem, though nobody has been changing NERVE recently, to my knowledge. Any recommendations about how to proceed?
We recently introduced a proxy in GitServer for NERVE but I've determined that that's not the cause, as the issue still occurs when trying directly as outlined here: https://github.com/cwrc/NERVE#testing-api-via-curl
Maybe Jeff might have some insight on possible solutions.
OK, I'll consult with him next week when he is back with CWRC. Tentatively reassigning the ticket to him.
@ilovan @ajmacdonald Is there a usecase for a charset other than UTF-8? I've what I think is a quick fix for the UTF-8 case. To support other charsets, a bigger refactor is required. See a number of Todos
@ilovan @ajmacdonald
The DockerHub dev instance is updated. Is there a way to test the image: cwrc/nerve:latest-dev
?
I've just tested on the dev server. No sign of offset in the document mentioned. I couldn't reproduce the bug.
correct - no more offset issue (tested here: https://github.com/cwrc/CWRC-WriterBase/issues/234#issuecomment-579362637, which in the past was impossible to NER because of the offset)
The offset is inherited by the entity once accepted. See for example Lisbon in: https://dev-cwrc-writer.cwrc.ca/?githubPath=TestGitWriterSchemaCSSchange&githubRepo=ilovan%2FGit-Writer-tests
The offset seems to be bigger the longer the document is: See : https://dev-cwrc-writer.cwrc.ca/?githubPath=Salomania&githubRepo=ilovan%2FNERVE-integration-tests
Discovered while testing https://github.com/cwrc/CWRC-WriterBase/issues/171