CottageLabs / LanternPM

Lantern meta repository for product management
1 stars 0 forks source link

UTF-8 encoding for :job/original #92

Closed richard-jones closed 8 years ago

richard-jones commented 8 years ago

When you request :job/original from the API, the content comes back ISO-8859-1 encoded, but we should stick to UTF-8.

markmacgillivray commented 8 years ago

What encoding was uploaded?

On 29 Jul 2016 18:07, "Richard Jones" notifications@github.com wrote:

When you request :job/original from the API, the content comes back ISO-8859-1 encoded, but we should stick to UTF-8.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CottageLabs/LanternPM/issues/92, or mute the thread https://github.com/notifications/unsubscribe-auth/AAuXCG4_o11410_RBznz3QT4sHy7zIMiks5qajM7gaJpZM4JYYx- .

richard-jones commented 8 years ago

It was a UTF-8 json document direct to the API, and :job/original gives a CSV, so I guess it goes through a transformation somewhere on the server.

markmacgillivray commented 8 years ago

Did you view the original in the browser or download then check the encoding? It's possibly just the default browser encoding, in which case I can specify utf-8 instead.

On 29 Jul 2016 18:23, "Richard Jones" notifications@github.com wrote:

It was a UTF-8 json document direct to the API, and :job/original gives a CSV, so I guess it goes through a transformation somewhere on the server.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CottageLabs/LanternPM/issues/92#issuecomment-236240609, or mute the thread https://github.com/notifications/unsubscribe-auth/AAuXCCvd1HfJximBP4x3TZKfaBut7daHks5qajcDgaJpZM4JYYx- .

richard-jones commented 8 years ago

I downloaded it using requests and checked the response encoding.

resp = requests.get("..../original") resp.encoding

Maybe this is a requests issue, specifically? Not sure if you are setting the response encoding explicitly on the server end.

markmacgillivray commented 8 years ago

I'll check

On 30 Jul 2016 08:10, "Richard Jones" notifications@github.com wrote:

I downloaded it using requests and checked the response encoding.

resp = requests.get("..../original") resp.encoding

Maybe this is a requests issue, specifically? Not sure if you are setting the response encoding explicitly on the server end.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CottageLabs/LanternPM/issues/92#issuecomment-236349704, or mute the thread https://github.com/notifications/unsubscribe-auth/AAuXCIpjxCPKW-pPcvlXFGwo2d52T5OGks5qavjOgaJpZM4JYYx- .

markmacgillivray commented 8 years ago

utf-8 can be set on the output, the default for text/csv is not utf-8. But, is this still true, don't know:

http://answers.microsoft.com/en-us/mac/forum/macoffice2011-macexcel/mac-excel-converts-utf-8-characters-to-underlines/7c4cdaa7-bfa3-41a2-8482-554ae235227b?msgId=c8295574-a053-48a6-b419-51523ce2a247&auth=1

(says Excel for Mac does not support utf-8)

markmacgillivray commented 8 years ago

@richard-jones see last comment, just checking with you if you want it utf8 or not even though at least old version of excel for mac may not work with it. Easy change if you want it to be utf8, I can just add it to the content-type and headers set by the API endpoint that sends you the file, but worth checking first I think.

richard-jones commented 8 years ago

Ok, since this is an API, I'd suggest we do everything as properly as possible and allow for people to specify their download format. So, default to UTF-8, and then do one of the following:

I've noticed character encoding issues in a number of things people have sent me showing the output of the system, and it's probably all tied together. What does the UI do, does it call this endpoint?

markmacgillivray commented 8 years ago

This is an API, but this part only exists to replicate old functionality - the wellcome users wanted to be able to get the original spreadsheet they uploaded, but actually the new system does not use spreadsheets at all. So instead I added this endpoint to return the original as a csv, and that is all. Really, if you are using the API you should be retrieving JSON. What is it that you actually want to hit the /original endpoint for, when using the API? If you just want a listing of the JSON you uploaded without any results, that would be better served by returning you the JSON, not a csv at all.

The UI does nothing at all with encoding - it takes the raw file submitted by the user and reads the byte stream, then that is sent to the server. This ensures that whatever the user sends in is what they get back out. This is what was required for Wellcome, where they wanted their output to look the same as their input. The same should be true when users retrieve their results, and has been tested by Emanuil and Cecy and is now claimed to be the case - e.g. the users that tested it get back what they expected (this was not originally the case when we did have some early issues).

The aim of the API is not to return spreadsheets at all, although of course it does so when necessary for the UI. But otherwise, when dealing via the API, it should be JSON. If there is a requirement for an API that actually does return data as csv because some remote API usage needs it, then I think we have to list those up as separate requirements and clarify why they are needed. It seems we now have the following potentials:

If the original or results csv should be available as different encodings, rather than just returning whatever the user uploaded, that is a new requirement.

If the API interactions should be expected to deal in csv (beyond just serving them as required for the UI), then that also is a new requirement.

If it is not the case that we are returning what the user expects based on their upload, then that is an issue that needs fixing, and indicates either an error or conflict with previously confirmed success.

On Tue, Aug 2, 2016 at 4:58 PM, Richard Jones notifications@github.com wrote:

Ok, since this is an API, I'd suggest we do everything as properly as possible and allow for people to specify their download format. So, default to UTF-8, and then do one of the following:

  • respect an Accept-Encoding conneg header
  • provide an ?encoding=blah query arg. You could also include, then, encoding=Mac, Excel, whatever for best practices

I've noticed character encoding issues in a number of things people have sent me showing the output of the system, and it's probably all tied together. What does the UI do, does it call this endpoint?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CottageLabs/LanternPM/issues/92#issuecomment-236951025, or mute the thread https://github.com/notifications/unsubscribe-auth/AAuXCN1UYjoZ1JwMm1C_bKEgRMPRSEGMks5qb2kogaJpZM4JYYx- .

richard-jones commented 8 years ago

Ok, can you determine the requirements and then let me know what the purpose of the endpoint is, then I will document it appropriately. I am just documenting the API based on what was in your document, but I wasn't given any of this information.

markmacgillivray commented 8 years ago

Oh, I don't have any requirements for it. I got the requirement from Emanuil telling me the original CSV had to be returned to the UI. I thought you meant you had a need for using it, not that your were documenting it. For the API documentation, it can be one of those that exists but is not documented - it just serves our UI requirements, so just leave it out.

On 2 Aug 2016 17:25, "Richard Jones" notifications@github.com wrote:

Ok, can you determine the requirements and then let me know what the purpose of the endpoint is, then I will document it appropriately. I am just documenting the API based on what was in your document, but I wasn't given any of this information.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/CottageLabs/LanternPM/issues/92#issuecomment-236959616, or mute the thread https://github.com/notifications/unsubscribe-auth/AAuXCAWEIpJ1ePsROekcdSLnRBJv957Jks5qb29pgaJpZM4JYYx- .

richard-jones commented 8 years ago

No, I'm just documenting it, and not being UTF-8 stood out as being odd.

I have just looked at /:job.results?format=csv and that doesn't come back as UTF-8 either, and I think that one will need to be corrected, or made to offer the user a selection of encodings. Our default position ought to be UTF-8 everywhere, though, otherwise we'll go insane.

markmacgillivray commented 8 years ago

This is probably fixed now, the other issue you created has been taken care of (default setting of API library).