enricobacis / wos

📚 Web of Science python client
https://wos.readthedocs.io/
MIT License
210 stars 45 forks source link

wos can't handle Cyrillic letter #31

Closed dinosauria123 closed 5 years ago

dinosauria123 commented 5 years ago

Prerequisites

Version

Name: wos Version: 0.1.15 Summary: Web of Science client using API v3. Home-page: http://github.com/enricobacis/wos Author: Enrico Bacis Author-email: enrico.bacis@gmail.com License: MIT Location: /usr/local/lib/python2.7/dist-packages/wos-0.1.15-py2.7.egg Requires: limit, suds Required-by:

I have installed latest wos-lite branch.

wos return error when output includes Cyrillic letter.

The data (WOS:000235944500019) has Cyrillic letter.

</records>
    <uid>WOS:000235944500019</uid>
    <title>
        <label>Title</label>
        <value>The magnetic field effect in the presence of the electric field on fluorescence of a methylene-linked compound of pyrene and N,N-dimethylaniline doped in a polymer film</value>
    </title>
    <doctype>
        <label>Doctype</label>
        <value>Article</value>
    </doctype>
    <source>
        <label>Issue</label>
        <value>9</value>
    </source>
    <source>
        <label>Pages</label>
        <value>3938-3941</value>
    </source>
    <source>
        <label>Published.BiblioDate</label>
        <value>MAR 9</value>
    </source>
    <source>
        <label>Published.BiblioYear</label>
        <value>2006</value>
    </source>
    <source>
        <label>SourceTitle</label>
        <value>JOURNAL OF PHYSICAL CHEMISTRY B</value>
    </source>
    <source>
        <label>Volume</label>
        <value>110</value>
    </source>
    <authors>
        <label>Authors</label>
        <value>Medvedev, ES</value>
        <value>Mizoguchi, M</value>
        <value>Ohta, N</value>
    </authors>
    <other>
        <label>Contributor.ResearcherID.Names</label>
        <value>Ohta, Nobuhiro</value>
        <value>Медведев, Эмиль</value>
    </other>
    <other>
        <label>Contributor.ResearcherID.ResearcherIDs</label>
        <value>E-1238-2012</value>
        <value>A-1697-2009</value>
    </other>
    <other>
        <label>Identifier.Doi</label>
        <value>10.1021/jp054553d</value>
    </other>
    <other>
        <label>Identifier.Ids</label>
        <value>020WS</value>
    </other>
    <other>
        <label>Identifier.Issn</label>
        <value>1520-6106</value>
    </other>
    <other>
        <label>Identifier.Xref_Doi</label>
        <value>10.1021/jp054553d</value>
    </other>
    <other>
        <label>ResearcherID.Disclaimer</label>
        <value>ResearcherID data provided by Clarivate Analytics</value>
    </other>
</records>

[Description of the bug or feature]

Steps to Reproduce

/usr/local/bin/wos --lite query 'SG=RIES AND OG=Hokkaido University AND PY=2006' -c12 >RIESdata2006.xml => works !

/usr/local/bin/wos --lite query 'SG=RIES AND OG=Hokkaido University AND PY=2006' -c13 >RIESdata2006.xml => shows error in RIESdata2006.xml (if output to terminal, it is fine)

ERROR: 'ascii' codec can't encode characters in position 29264-29271: ordinal not in range(128)

enricobacis commented 5 years ago

Thanks for your report! The steps to reproduce were extremely helpful in detecting the problem.

This happens because python tries to identify if the stdout is being written to a device capable of displaying UTF-8 characters or not (something that I never knew). The terminal is detected as UTF-8 capable whereas it tries to convert to ASCII when writing to files. This is described here: http://blog.mathieu-leplatre.info/python-utf-8-print-fails-when-redirecting-stdout.html

I'll always forceful convert to UTF-8 anyways to avoid any error. I'll push a commit in a few minutes, let me know if it fixes your problem.

dinosauria123 commented 5 years ago

Thank you for fix the problem.

I will check output tomorrow.

dinosauria123 commented 5 years ago

OK, It works perfect ! Thank you for your helps !