elifesciences / elife-tools

Python library for parsing eLife article XML data.
MIT License
15 stars 7 forks source link

bumps requests to 2.20.0 #292

Closed lsh-0 closed 5 years ago

lsh-0 commented 5 years ago

@gnott do you know about these failures? none of them relate to the updated requests lib

gnott commented 5 years ago

Not sure, and I just merged a successful PR a few hours ago. Maybe requests ends up installing some libraries that are helping in encoding / unicode / character conversion?

gnott commented 5 years ago

It could also be a temporary build issue.

gnott commented 5 years ago

Spending some time looking at this carefully, I've narrowed it down:

The automated tests - both in travis-ci and Alfred - install coveralls. coveralls requires requests, which ends up installing requests==2.20.0.

That version of requests requires chardet.

When chardet is present, when in the test cases it reads an XML file from the fixture folders, I think it is trying to guess at the charset of the file, and it is guessing wrong.

I think this is only related to the test scenarios and appears when it is comparing parser output to the file fixtures, and it is not affecting the actual parser output. If it is changing the parser output, hopefully we'll see test failures in other projects or integration tests like in end2end if the new version of requests (and chardet) is causing more problems when parsing XML files.

I have a potential fix I will commit here shortly, to use io.open instead of open for the file fixtures, where the encoding of the file is explicitly opening the file as encoding='utf-8'.

coveralls commented 5 years ago

Coverage Status

Coverage decreased (-0.02%) to 99.573% when pulling 56f4a76ebb2027db51c00b564d25c7c46b4ba438 on fix-requests-vuln into 3b5ed15319a351b360ca6a8a90a9076778dc6c80 on develop.

gnott commented 5 years ago

There are some python 2 vs python 3 encoding tweaks required which I think are localised to the test scenarios only. There could be some python 3 encoding problem lurking in the xmlio.py module we may see later, unrelated to the nature of this PR.