Symplectic / vivo-harvester-old-deprecated

http://www.symplectic.co.uk/
Other
9 stars 5 forks source link

Issue with Translator: shows Invalid byte 1 of 1-byte UTF-8 sequence #1

Closed thomas-weil closed 11 years ago

thomas-weil commented 11 years ago

Hello, Running V.2 of VIVO harvester code. Extracts raw records fine. At Translation phase we see error: Error on line 1 column 1623 SXXP0003: Error reported by XML parser: Invalid byte 1 of 1-byte UTF-8 sequence. 2013-02-07 14:52:54.331 ERROR [u.c.s.t.TranslationServiceImpl] Unable to perform translation net.sf.saxon.trans.DynamicError: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1623; Invalid byte 1 of 1-byte UTF-8 sequence.

Investigated and found thread about similar error in pubmed harvester: https://issues.library.cornell.edu/browse/VIVOHARV-91 : is this likely to be the issue here and if so can it be fixed?

Thanks Tom

grahamtriggs commented 11 years ago

Hi Tom,

This was resolved on the 25th January:

https://github.com/Symplectic/vivo/commit/a04db500da5da49cbe708157415a63c3fddd68bb

Looks like it's basically the same issue as the PubMed harvester, except I've gone the route of forcibly writing out a UTF-8 file when fetching the XML, which the parser will then read correctly (based on the XML declaration).

Regards, Graham