UW-Macrostrat / macrostrat-api

The API for SCIENCE
3 stars 1 forks source link

unicode characters in units long response #161

Open jonhusson opened 8 years ago

jonhusson commented 8 years ago

It was a simple patch on my part, but just so you are aware. Python was choking on this return:

https://dev.macrostrat.org/api/units?project_id=1&response=long&unit_id=16011

because the 'notes' string had this unicode character:

'AAPG Bulletin, v. 90, no. 11 (November 2006), pp. 1803\u20131841'

instead of a dash:

AAPG Bulletin, v. 90, no. 11 (November 2006), pp. 1803–1841

jczaplew commented 8 years ago

How were you requesting and parsing the response? When I look at that return in the browser it is properly parsed as a dash, and in the database itself it is also a dash.

jonhusson commented 8 years ago

unit_success = requests.get('https://dev.macrostrat.org/api/units?project_id=1&format=csv&response=long&lith_type=metasedimentary&lith_class=sedimentary') units = unit_success.text units = csv.reader(units.splitlines(), delimiter=',’)

On May 13, 2016, at 10:44 AM, John J Czaplewski notifications@github.com<mailto:notifications@github.com> wrote:

How were you requesting and parsing the response? When I look at that return in the browser it is properly parsed as a dash, and in the database itself it is also a dash.

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHubhttps://github.com/UW-Macrostrat/macrostrat-api/issues/161#issuecomment-219081595

jczaplew commented 8 years ago

This seems to be an issue with the CSV parser. I'll investigate.

jonhusson commented 8 years ago

It might not be our fight - just thought you should be aware

jczaplew commented 8 years ago

Seeing as I am the maintainer of the CSV parsing package it is my fight ;-)

jonhusson commented 8 years ago

ahh. I see - I though it a Python problem

On May 13, 2016, at 11:01 AM, John J Czaplewski notifications@github.com<mailto:notifications@github.com> wrote:

Seeing as I am the maintainer of the CSV parsing package it is my fight ;-)

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHubhttps://github.com/UW-Macrostrat/macrostrat-api/issues/161#issuecomment-219086107

jczaplew commented 7 years ago

I think things should be properly encoded as UTF-8. Would explicitly opening the csv file as such fix it?

units = csv.reader(units.splitlines(), delimiter=',’, encoding='utf-8')

jonhusson commented 7 years ago

Python no like:

TypeError: 'encoding' is an invalid keyword argument for this function