OpenDataServices / flatten-tool

Tools for generating CSV and other flat versions of the structured data
http://flatten-tool.readthedocs.io/en/latest/
MIT License
105 stars 15 forks source link

fix: Read bytes to avoid ijson deprecation #458

Closed jpmckinney closed 2 months ago

jpmckinney commented 3 months ago

Otherwise:

DeprecationWarning: 
ijson works by reading bytes, but a string reader has been given instead. This
probably, but not necessarily, means a file-like object has been opened in text
mode ('t') rather than binary mode ('b').

An automatic conversion is being performed on the fly to continue, but on the
other hand this creates unnecessary encoding/decoding operations that decrease
the efficiency of the system. In the future this automatic conversion will be
removed, and users will receive errors instead of this warning. To avoid this
problem make sure file-like objects are opened in binary mode instead of text
mode.

encoding="utf-8" needs to be removed, as codes.open will use mode "rt" instead of "rb" if it's present, and ijson will read the bytes as UTF-8 anyway.

Note that ijson has no special exception class for UTF-8 errors, so one test had to change.

odscjames commented 2 months ago

Does this mean we can then remove the except UnicodeDecodeError as err: ... raise BadlyFormedJSONErrorUTF8(*err.args) below? Tests run ok without it.

If so we can then technically remove the BadlyFormedJSONErrorUTF8 class entirely as that's the only place it's used. However other people may be using it and thus get breakages, so maybe it's best to leave the class but put a docstring on it saying it is deprecated?

(Also a note in CHANGELOG.md would be good, thanks!)

jpmckinney commented 2 months ago

Done

odscjames commented 2 months ago

Thank you!

odscjames commented 2 months ago

Merged in, thanks!