Thriftpy / thriftpy

Thriftpy has been deprecated, please migrate to https://github.com/Thriftpy/thriftpy2
MIT License
1.15k stars 288 forks source link

Utf-8-encoded unicode in thrift definition comments causes failure of thriftpy.load in Python 3 #309

Open aawilson opened 6 years ago

aawilson commented 6 years ago

To reproduce, save the following as a .thrift file in an app that preserves the quotations as they are (rather than converting them to something ASCII-friendly) (my test file was saved as utf-8, for reference):

service PingPong {
    /* Ping to the pong with “funky quotes” y'all */
    string ping(),
}

In a Python 3 environment, run the following:

from thriftpy import load
load(path_to_thrift)

Observe something like this:

UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 66: character maps to <undefined>

(this was on Windows, other platforms might have other codecs listed, or maybe won't experience this problem at all).

I was personally able to fix this by adding an "encoding" argument to the open call in parser.py, but that argument doesn't exist in Python 2.7 and lower, so it is not a version-agnostic fix (and could conceivably be wrong anyway if the thrift file were saved in some other encoding for some reason, since I doubt the spec actually specifies an encoding). My guess is that file treatment will have to be rewritten to open files as binary and treat them explicitly rather than just passing them to the lexer (simply passing mode='rb' wasn't sufficient, so there's more to do).