ejwa / gitinspector

:bar_chart: The statistical analysis tool for git repositories
GNU General Public License v3.0
2.35k stars 324 forks source link

UnicodeEncodeError: 'charmap' codec can't encode character '\u0107' in position 865: character maps to <undefined> #183

Closed rameshrr closed 5 years ago

rameshrr commented 5 years ago

gitinspector -F html > test.html

Traceback (most recent call last): File "C:\Users\EMD BACKUP\AppData\Roaming\npm\node_modules\gitinspector\gitinspector.py", line 24, in gitinspector.main() File "C:\Users\EMD BACKUP\AppData\Roaming\npm\node_modules\gitinspector\gitinspector\gitinspector.py", line 206, in main run.process(repos) File "C:\Users\EMD BACKUP\AppData\Roaming\npm\node_modules\gitinspector\gitinspector\gitinspector.py", line 86, in process outputable.output(BlameOutput(summed_changes, summed_blames)) File "C:\Users\EMD BACKUP\AppData\Roaming\npm\node_modules\gitinspector\gitinspector\output\outputable.py", line 39, in output outputable.output_html() File "C:\Users\EMD BACKUP\AppData\Roaming\npm\node_modules\gitinspector\gitinspector\output\blameoutput.py", line 95, in output_html print(blame_xml) File "C:\Program Files (x86)\Python36-32\lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] UnicodeEncodeError: 'charmap' codec can't encode character '\u0107' in position 865: character maps to

adam-waldenberg commented 5 years ago

Hi @rameshrr . This is not related to gitinspector, but to your terminal or destination pipe not being able to handle the characters. What is interesting though is that it defaults to cp1252 for some reason despite the redirection. It should default to UTF-8. That means set_stdout_encoding() is failing to set the encoding to UTF-8 for some reason.

This could actually happen if you run gitinspector under Python3 with PYTHIONIOENCODING forced/set to cp1252. It could also happen if isatty() returns true depsite the redirection. There are countless issues covering this already here on the tracker.

In what terminal are you running gitinspector? You could try setting PYTHIONIOENCODING to UTF-8 before running gitinspector - this would probably solve your issues.

adam-waldenberg commented 5 years ago

Just to more clearly explain what is going on - your terminal/pipe is requesting CP1252 characters from Python, so python is trying to comply and output via that encoding. Unfortunately, certain characters from UTF-8 won't map to that encoding.

rameshrr commented 5 years ago

Understood..

fyi, I'm using windows command prompt(windows 10).

It will be better if application handles the exception and prints some useful message to the user - Just my thoughts :)

adam-waldenberg commented 5 years ago

This has also been discussed in the past. The exception from Python already provides more information than any error message. There are many exceptions that can occur from invalid decoding and terminal issues in Python - swallowing these messages is generally not a good idea and would just make it more difficult to handle. There's also a lot of them. We could maybe extend them and add some additional help... But I'm uncertain how useful it would really be. Gitinspector already prints certain hints concerning PYTHONIOENCODING to stderr under certain conditions.

Try using PowerShell (provided with windows since v9) and see if that makes a difference. The normal command prompt isn't really usable for any kind of decent work.