To reproduce, save the following as a .thrift file in an app that preserves the quotations as they are (rather than converting them to something ASCII-friendly) (my test file was saved as utf-8, for reference):
service PingPong {
/* Ping to the pong with “funky quotes” y'all */
string ping(),
}
In a Python 3 environment, run the following:
from thriftpy import load
load(path_to_thrift)
Observe something like this:
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 66: character maps to <undefined>
(this was on Windows, other platforms might have other codecs listed, or maybe won't experience this problem at all).
I was personally able to fix this by adding an "encoding" argument to the open call in parser.py, but that argument doesn't exist in Python 2.7 and lower, so it is not a version-agnostic fix (and could conceivably be wrong anyway if the thrift file were saved in some other encoding for some reason, since I doubt the spec actually specifies an encoding). My guess is that file treatment will have to be rewritten to open files as binary and treat them explicitly rather than just passing them to the lexer (simply passing mode='rb' wasn't sufficient, so there's more to do).
To reproduce, save the following as a .thrift file in an app that preserves the quotations as they are (rather than converting them to something ASCII-friendly) (my test file was saved as utf-8, for reference):
In a Python 3 environment, run the following:
Observe something like this:
(this was on Windows, other platforms might have other codecs listed, or maybe won't experience this problem at all).
I was personally able to fix this by adding an "encoding" argument to the
open
call in parser.py, but that argument doesn't exist in Python 2.7 and lower, so it is not a version-agnostic fix (and could conceivably be wrong anyway if the thrift file were saved in some other encoding for some reason, since I doubt the spec actually specifies an encoding). My guess is that file treatment will have to be rewritten to open files as binary and treat them explicitly rather than just passing them to the lexer (simply passingmode='rb'
wasn't sufficient, so there's more to do).