Python3 compat - Githubissues

I started making this python 3 compat since @lcombs was on a 3.x anaconda distribution. Right now the changes are

print statements->print function lolol
File opening is now totally different in python 3.x series. In this PR any open() function calls are called with 'r' or 'w' mode instead of 'rb' or 'wb'. In python 3.x, 'rb' and 'wb' now explicitly mean read as a BytesIO object, while 'r' (implicitly 'rt') and 'w' (implicitly 'wt)) mean read as a TextIO object (which means it does the str conversion for you on read). In python 2.x series, the open() function instantiated a file object, whose read method implicitly decoded the underlying bytestream to str. So the TextIO behavior is more similar to what we used to do. If we want, we can read them in a BytesIO and decode them ourselves, but given that python 3.x series default encoding is now UTF-8 I'm don't think that's necessary. So actually instead I now converted all of the open() calls to explicitly use io.open() (which is the default for python 3.x series for open()) so that both styles use the same interface. I explicitly load as binary or text at each call; most importantly during the generate_folds script, I load as binary and force downconvert to ascii encoding as we normally would do later during the clean_corpus steps.

~~Surprisingly since 'rb' and 'wb' mode in python 2.x series probably didn't do anything useful for us given that we called the .read() method anyways, this may still be 2.x compatible.~~

ayota / ddl_nlp

Python3 compat #51