Closed BramVanroy closed 1 year ago
Thanks for bringing that up. I don't have a Windows machine to test on, but I recall that Windows has strange, sometimes non-standard default encodings, such as UTF-8 with a byte-order-mark. I agree that allowing an encoding option would be helpful. I think you'd only need to target where text is read from the filesystem, as everything should be native Python (unicode) strings internally, so look for calls to open()
. I see the following:
There are a couple other places open()
is called, but they seem less urgent (e.g., in setup.py
or the unit tests).
If you want to submit a PR with the necessary changes, that would be great.
Thank you for the quick response! Indeed, unfortunately Windows still relies on cp1252 encoding (not uf-8-bom), which always leads to small issues like this. I'll look for built-in "open" methods and allow for the option of custom encoding there. I'll send in a PR soon.
I tend to run my dev code locally on my powerful Windows-based home set-up before pushing to the cluster. In this instance, I am trying to run spring which relies on
penman
. It makes some calls to penman'sload
method. I am running their preprocessing to parse AMR data and found calls topenman._load
result in encoding issues.I am aware I can just run on WSL and be done with it, but I'd rather see this useful tool be available cross-platform. Is there any way I can contribute? Which methods are all reliant on encoding? My approach would be to allow for an optional
encoding
argument, as is common in enc/dec methods, and pass it through the relevant IO functions likeopen
.For starters I can start with
codec
if that is okay, e.g., change the_load
function to