architv / soccer-cli

:soccer: Football scores for hackers. :computer: A command line interface for all the football scores.
MIT License
1.09k stars 222 forks source link

Fix unicode issue with json and csv output #46

Closed Saturn closed 8 years ago

Saturn commented 8 years ago

For --json It previously printed like this: "awayTeamName": "1. FC K\u00f6ln" But we want: "awayTeamName": "1. FC Köln"

And with --csv it just did not work for a team that had a special character.

UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 11: ordinal not in range(128)

The Python 2 CSV library is pretty unhelpful when it comes to unicode. At least by default.

Seemed simpler to just go element by element and make sure and encode it appropriately.

During the fix I noticed I had a familiar issue on Windows where it doesn't print a unicode character properly:

$ python -c "print 'Köln'"
$ K÷ln
$
$ python -c "print u'Köln'"
$ Köln

This PR sorts that issue too though.

Before:

$ soccer --league BL --json
{
    "league_scores": [
        {
            "league": "BL",
            "homeTeamName": "FC Bayern M\u00fcnchen",
            "goalsAwayTeam": 1,
            "awayTeamName": "FC Augsburg",
            "goalsHomeTeam": 2
        },

After:

$ soccer --league BL --json
{
    "league_scores": [
        {
            "league": "BL",
            "homeTeamName": "FC Bayern München",
            "goalsAwayTeam": 1,
            "awayTeamName": "FC Augsburg",
            "goalsHomeTeam": 2
        },
Saturn commented 8 years ago

io.open has an encoding option which made things easier. Pretty sure it is the new open in Python 3.

The reason to convert to unicode and then encode is because some of the elements are ints. We are converting back to str because of the limitations of the csv module in Python 2. This way it will not choke on a 'special unicode character'. I noted there is other methods of dealing with it but I thought this was pretty simple. (It is just one line :smile:)

# We get some unicode from the API 
['BL', u'FC Bayern M\xfcnchen', 2, 1, u'FC Augsburg']
# After encoding
['BL', 'FC Bayern M\xc3\xbcnchen', '2', '1', 'FC Augsburg']
carlosvargas commented 8 years ago

I found some similar unicode issues on the Stdio class when I was trying to use '{0}'.format(u'FC Bayern M\u00fcnchen') in order to make alignment easier. I avoided doing any of the encoding stuff, because I wasn't sure if we were unofficially supporting Python 3. This comment made it seem like support should be relatively easy and this commit removed that, so I thought that we were unofficially supporting Python 3.

Should we also support Python 3? Or just stick to Python 2.7?

Saturn commented 8 years ago

The reason I changed to the % formatting method was because everything just sort of worked without much effort.

When passing a unicode string with the format method, I think the format string also needs to be unicode.

u'{0}'.format(u'FC Bayern M\u00fcnchen')

Python 3 support? I have absolutely no idea where that stands.