JamesJeffryes / pyclassyfire

A python client for the ClassyFire API
https://jamesjeffryes.github.io/pyclassyfire/
MIT License
15 stars 4 forks source link

Unicode decoding problem with function client.sdf_query() #1

Closed nuzillard closed 4 years ago

nuzillard commented 4 years ago

Dear James,

Thank you for having written such a Python interface to Classyfire!

I submitted the unzipped file buffer16.zip to function call client.sdf_query(filenameBuffer, filenameOut) and an exception was raised:

1 queries submitted to ClassyFire API 0 percent complete Traceback (most recent call last): File "run2.py", line 93, in client.sdf_query(filenameBuffer, filenameOut) File "C:\Users\jmn\Documents\CNRS20\Publis\PNMRNP_STARTED\Biblio\pyclassyfire-master\pyclassyfire\client.py", line 223, in sdf_query outfile.write(get_results(query_ids[i], return_format='sdf')) File "C:\Users\jmn\Anaconda3\envs\rdkit3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u03b1' in position 8847: character maps to

It looks like classification by classyfire was successful but some unicode character cannot be inserted in the resulting .sdf file.

\u03b1 is the unicode encoding for the greek letter alpha, not so uncommon in chemistry.

Could you please indicate a fix or a workaround?

Many thanks in advance. Best regards,

Jean-Marc

nuzillard commented 4 years ago

For this molecule, the web interface of Classyfire says it is a "Cyclic depsipeptides", with a list of alternative parents that starts with "Alpha amino acid esters".

JamesJeffryes commented 4 years ago

Hi Jean-Marc,

I wasn't able to reproduce your error but then is found this stack overflow post that indicated that this issue might only appear on Windows and I develop on a Mac. Looks like there's a cross-compatible solution but I have no easy way of testing locally. I'll push an update and let me know if it works for you.

After 3 years of neglect I'm pleasantly surprised this client works at all. Props to the Wishart Lab for maintaining a stable API!

nuzillard commented 4 years ago

The unicode problem is solved. A batch of 1000 molecules was classified in about 1 minute, which is really nice.

Many thanks!

Jean-Marc