Open ctb opened 7 years ago
What does echo $LANG
or locale
say for you? On a machine with LANG=en_US.utf-8 it works for me.
Can't reproduce this locally.
bump @ctb can you tell us your $LANG?
% pip install https://github.com/dib-lab/khmer/archive/master.zip
...
% pip show khmer
...
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/logging/__init__.py", line 982, in emit
stream.write(msg)
UnicodeEncodeError: 'ascii' codec can't encode character '\xe1' in position 277: ordinal not in range(128)
...
% echo $LANG
en_US.UTF-8
I'm a bit stumped by this :-/ We have non ascii characters in the list of contributors, encoding them as ascii (unsurprisingly) doesn't work ... tried to find out how python determines the default encoding to use because I thought it looked at $LANG and friends (but evidently not).
What do these two say:
$ python -c 'import sys; print(sys.getdefaultencoding())'
$ python -c 'import sys; print(sys.stdin.encoding, sys.stdout.encoding)'
Some guesses:
site.py
(or related) to set the encoding?setup.py
so that pip interprets everything as unicode?% python -c 'import sys; print(sys.getdefaultencoding())'
utf-8
% python -c 'import sys; print(sys.stdin.encoding, sys.stdout.encoding)'
US-ASCII US-ASCII
The value of sys.stdout.encoding
isn't set by LANG but by LC_CTYPE or at least it is also influenced by CTYPE. So you can predict my question: what is it set to for you? :)
More generally, not sure how we/khmer can fix this. If a user legitimately has their terminals encoding set to ASCII then pip
can not print the authors of khmer. Is this a bug in pip or a case of human asking a computer to do the impossible? Don't have a better idea other than the suboptimal "mutilating people's names so they don't contain non ascii" or we leave it as won't fix.
On Tue, Apr 18, 2017 at 01:20:11AM -0700, Tim Head wrote:
The value of
sys.stdout.encoding
isn't set by LANG but by LC_CTYPE or at least it is also influenced by CTYPE. So you can predict my question: what is it set to for you? :)
% echo $LC_CTYPE C
so that's it -- and indeed when I fix that, it all works.
More generally, not sure how we/khmer can fix this. If a user legitimately has their terminals encoding set to ASCII then
pip
can not print the authors of khmer. Is this a bug in pip or a case of human asking a computer to do the impossible? Don't have a better idea other than the suboptimal "mutilating people's names so they don't contain non ascii" or we leave it as won't fix.
I think #wontfix is fine and will set it accordingly. But it's nice to have this in the issue tracker ;).
👍
This is now cropping up in all 'setup.py' executions -- I've had to run
export LC_CTYPE=utf-8
to build khmer v2.1.
I think this is a problem with pip. authors
has the str
type, which is the right type in python3, but pip can't handle it.
I think @wltrimbl is right here. I'm starting to connect some dots here.
I spent an inordinate amount of time last night troubleshooting a Docker build issue for kevlar. I could get khmer[1] and kevlar to install just fine using the Python 2.7 toolscape, but when it came to actually running kevlar it would fail since it dropped 2.7 support the same time khmer did. Once I changed the Docker build config to the Python 3.x toolscape (python3-dev
, pip3
, etc.) it would fail at the khmer installation step citing the same ascii encoding error. Searching the interwebs for solutions brought up many people experiencing similar problems. Some problems are claimed to be fixed in the still-to-be-released pip v10, but even installing that did not work for khmer.
The (currently disabled) Docker build that runs with our CI also uses the Python 2.7 toolscape. This fails with the latest master, as would be expected since we dropped 2.x support several months ago. Updating the Docker config to the Python 3.5 toolscape raises the same ascii encoding error as before.
So:
LC_CTYPE
doesn't seem to work in this caseSad to say, the easiest solution will probably be to do a (hopefully temporary) projection of the author names onto ASCII space until the pip issues are ironed out.
[1] It turns out I was using an older version of khmer that still supported Python 2.7. The latest master will not install successfully using pip2.