CottageLabs / LanternPM

Lantern meta repository for product management
1 stars 0 forks source link

Character encoding problems in JSON from API in /:job/results #94

Closed richard-jones closed 8 years ago

richard-jones commented 8 years ago

As an example, when viewing this record in a browser window, there are character encoding issues in the author affiliation strings

https://api.cottagelabs.com/service/lantern/2kDfBS9ytoyGTCtG5/results

e.g. *Department of Surgery, University of Washington, Seattle †Department of Surgery, Swedish Medical Center, Seattle, WA ‡Surgical Care and Outcomes Assessment Program (SCOAP), Seattle, WA §Department of Surgery, Oregon Health & Science University, Portland ¶Department of Surgery, Madigan Army Medical Center, Tacoma, WA ‖Department of Surgery, Virginia Mason Medical Center, Seattle, WA.

markmacgillivray commented 8 years ago

What did you upload, and how? Via json to api, or CSV through UI?

On 2 Aug 2016 17:30, "Richard Jones" notifications@github.com wrote:

As an example, when viewing this record in a browser window, there are character encoding issues in the author affiliation strings

https://api.cottagelabs.com/service/lantern/2kDfBS9ytoyGTCtG5/results

e.g. *Department of Surgery, University of Washington, Seattle †Department of Surgery, Swedish Medical Center, Seattle, WA ‡Surgical Care and Outcomes Assessment Program (SCOAP), Seattle, WA §Department of Surgery, Oregon Health & Science University, Portland ¶Department of Surgery, Madigan Army Medical Center, Tacoma, WA ‖Department of Surgery, Virginia Mason Medical Center, Seattle, WA.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CottageLabs/LanternPM/issues/94, or mute the thread https://github.com/notifications/unsubscribe-auth/AAuXCLiqyFRt0zFDRlgaeiIUbbdKq1znks5qb3C8gaJpZM4JazNq .

richard-jones commented 8 years ago

Created via API with JSON, but the character encoding issue is not in the upload, it's in the download - author affiliations taken from CrossRef, I think.

markmacgillivray commented 8 years ago

Those symbols come from the eupmc data actually, having checked:

http://www.ebi.ac.uk/europepmc/webservices/rest/search?query=DOI:10.1097/sla.0000000000000894&resulttype=core&format=json

and the default for the API output was not utf-8, so it would be interpreted by the browser and in this case was coming out windows-1251 I think (did not write it down before fix).

Anyway, I have now changed the default headers to serve JSON with utf-8. This was just a config of the API lib being used. I don't know why they would not do so by default, but there you go. This probably also closes the other recent issue, where you asked about wanting utf-8 out, so I will go and look for those and close them too.

The fix has not been pushed to live yet, but you can see the output properly formatted on dev:

https://dev.api.cottagelabs.com/use/europepmc/doi/10.1097/sla.0000000000000894