ejwa / gitinspector

:bar_chart: The statistical analysis tool for git repositories
GNU General Public License v3.0
2.36k stars 327 forks source link

Unicode error #122

Closed vmpartner closed 7 years ago

vmpartner commented 8 years ago

Unicode error

image

adam-waldenberg commented 8 years ago

Hi. Welcome to the wonderful world of cryptic Python unicode error messages. This can be any number of things. My first guess would be an issue with the terminal. However, there are a few things you should check before we can iron it down;

  1. What codeset/encoding is the terminal configured to ?
  2. Do you get the same error if you redirect output to a file ?
  3. Does it behave the same in Python 3 ?
  4. What version of gitinspector is this ?
vmpartner commented 8 years ago
  1. Its Xshell5 for Windows. Encoding UTF-8 image
vmpartner commented 8 years ago
  1. I can't install Python 3. Current version Python 2.7.9
  2. Latest. Git clone yesterday
vmpartner commented 8 years ago
  1. image

vmpartner commented 8 years ago

In git repo we have russian cp-1251 characters. May be this will be helpful

adam-waldenberg commented 8 years ago

Hi @vmpartner. This particular error is, I think, related to the fix in issue #46. This change was added in order to handle escape characters in emails - something that can occur when a repository is imported into git from other revision control systems.

What happens if you remove that line and run it? Maybe we can just ignore it and catch the exception. This line is really only used for a particular corner case that is very uncommon, so that should be an acceptable solution.

vmpartner commented 8 years ago

Yes, its work now. Nice program ;)

adam-waldenberg commented 8 years ago

Great. I will implement a fix at a later point. Thank you. Keeping it open for now.

vassilevsky commented 7 years ago

I have successfully ran gitinspector on OS X after applying this solution:

https://coderwall.com/p/-k_93g/mac-os-x-valueerror-unknown-locale-utf-8-in-python

I think it needs to be added to this project's README or FAQ. Which one is better?

adam-waldenberg commented 7 years ago

Hi @vassilevsky. None. As it doesn't really concern gitinspector (it affects all Python applications under OS X), I think we will add a a specific FAQ for OS/Python specific errors eventually. There are some errors related to Windows that might be worth mentioning as well. For the particular error you reported, there are several old bugs discussing this, for example #109, #93, #53, #32 and #9 to name a few of them.

kiwichris commented 7 years ago

I have hit a Unicode issue in gitinspector in changesoutput.py with the RTEMS repo (https://git.rtems.org/rtems.git). We have Unicode users in the repo and I have no locale set and locale.getpreferredencoding() is returning 'US-ASCII'. This is on FreeBSD 10.3 with Python 2.7.12. I have hacked around the problem by adding:

import codecs import sys sys.stdout = codecs.getwriter('UTF-8')(sys.stdout)

in gitinspector/gitinspector.py:main. This is hack is taken from https://wiki.python.org/moin/PrintFails without the getting the preferred encoding and forcing UTF-8.

adam-waldenberg commented 7 years ago

@kiwichris Discussed several times previously and definitely not a bug in gitinspector. There are a few things you can do to modify behaviour;

  1. Use a terminal with unicode encoding set up.
  2. Set PYTHIONIOENCODING to UTF-8.
  3. Pipe the output to a file (defaults to using UTF-8 and does your exact code for both stdin/stdout).

Instead of forcing the output to utf-8 (as you do in your hack), gitinspector will always try to re-encode/"convert" characters to the requester chacrater encoding. However, US-ASCII lacks mappings for many unicode characters. In any case - it's the correct behavior.

kiwichris commented 7 years ago

Sure, the solutions you highlight make sense. The python error makes it look like an bug.

adam-waldenberg commented 7 years ago

Fixed with the above commit. Report any problems related to this.