OpenEye-Contrib / OEMicroservices

A collection of RESTful microservices built around OpenEye toolkits
Apache License 2.0
7 stars 3 forks source link

'utf-8' codec can't decode byte 0x80 in position 49: invalid start byte #1

Open ctapobep opened 8 years ago

ctapobep commented 8 years ago

This happens when I try to convert molecule into CDX via /v1/convert/molecule:

{
    "molecule": {
        "value": "Fc1cc(c(F)cc1F)C[C@@H](N)CC(=O)N3Cc2nnc(n2CC3)C(F)(F)F",
        "input": {
            "format": "smiles"
        },
        "output": {
            "format": "cdx"
        }
    }
}
coleb commented 8 years ago

@ctapobep, can you include what version of Python and the OpenEye toolkits you're using? Unicode issues are the core issue we see between Python 2 & 3. It's something that was also addressed heavily in the 2016.Feb toolkit release.

ctapobep commented 8 years ago

Python:

3.5.1 |Continuum Analytics, Inc.| (default, Dec 7 2015, 11:16:01) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]

pip list | grep -i openeye:

OpenEye-toolkits (2016.2.1) OpenEye-toolkits-python3-ubuntu-14.04-x64 (2016.2.1)

ctapobep commented 8 years ago

Created a PR with the Docker env that installs all these dependencies along with OE Microservices: https://github.com/OpenEye-Contrib/OEMicroservices/pull/3 So you can try it out to reproduce the issue.

coleb commented 8 years ago

@ctapobep and @sajohn2, current OEMicroservices code is really only designed to handle utf-8 encodings. Binary data like CDX (and OEB for that matter) won't work in this current code.

Leads to the general question: what's the best way to encode binary data in a JSON field, @jlafon says to base64 encode it, so that's what I'm doing in the related branch here: https://github.com/OpenEye-Contrib/OEMicroservices/commit/903a0a9b1bbf32f6adb0ec6da7dff55b9f4a86cb

ctapobep commented 8 years ago

I'm not really sure that the data should be returned in JSON - why not return the data itself? When I ask the service for the conversion I know the format it's going to respond with - I asked for that format. So IMHO no need to duplicate this info in the JSON. Though if needed, this it could be added as HTTP header.

coleb commented 8 years ago

Don't want to change the output dramatically as @sajohn2 has stuff that depends on this API.

Though I just realized we're going to need a "base64" flag in the output to mark that the value has been base64 encoded, in the same way the "gz" flag encodes this.

scott-arne commented 8 years ago

We don't have anything that depends on it. And gz is b64 encoded because it is binary data. Just ask for gz and cdx + oeb will work.

On May 16, 2016, at 17:56, Brian Cole notifications@github.com wrote:

Don't want to change the output dramatically as @sajohn2 has stuff that depends on this API.

Though I just realized we're going to need a "base64" flag in the output to mark that the value has been base64 encoded, in the same way the "gz" flag encodes this.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub

scott-arne commented 8 years ago

From the docs: Note the gz parameter implies gzipping, then base64 encoding...

We have no infrastructure using the molecule conversion service -- that was just tossed in there as an example for how to do it with the toolkits for Merck IT (they seem to think there's some magic behind converting between file formats). The rendering is used quite a bit, and I'm doing code review for a massive update from Scott Harrison that includes some really awesome highlighting options.

On Mon, May 16, 2016 at 9:52 PM, Scott Johnson sajohn2@gmail.com wrote:

We don't have anything that depends on it. And gz is b64 encoded because it is binary data. Just ask for gz and cdx + oeb will work.

On May 16, 2016, at 17:56, Brian Cole notifications@github.com wrote:

Don't want to change the output dramatically as @sajohn2 https://github.com/sajohn2 has stuff that depends on this API.

Though I just realized we're going to need a "base64" flag in the output to mark that the value has been base64 encoded, in the same way the "gz" flag encodes this.

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/OpenEye-Contrib/OEMicroservices/issues/1#issuecomment-219560856

jlafon commented 8 years ago

We have no infrastructure using the molecule conversion service

It might be worthwhile to make some changes to the response then. I would suggest setting HTTP headers for content type and encoding, rather than putting them in the response body. That would allow you to have binary responses (application/octet-stream content-type with optional gzip content-encoding).