23andMe / Yamale

A schema and validator for YAML.
MIT License
670 stars 88 forks source link

File encoding is always the system one and cannot be changed #189

Closed reconman closed 2 years ago

reconman commented 2 years ago

In this line, open() is used without any encoding parameter, so the system one is always chosen: https://github.com/23andMe/Yamale/blob/master/yamale/readers/yaml_reader.py#L34

This leads to users not being able to read UTF-8 encoded files on Windows.

It should be possible to set the encoding as an optional parameter during make_schema() and make_data().

I would advocate for the encoding being UTF-8 by default.

mildebrandt commented 2 years ago

Hi @reconman , thanks for your interest in Yamale!

We cannot change the encoding to UTF-8 without releasing a major version change since that may break existing users. We can to default it to the user's default locale instead. But before we look at doing that, I'd like to find a way to do this without a change to Yamale.

Have you tried to enable UTF-8 mode when running python? https://www.python.org/dev/peps/pep-0540/

You can either use python -X uft8 or set the environment variable PYTHONUTF8=1.

Let me know if that works for your use case.

reconman commented 2 years ago

The project I'm maintaining is using Yamale as a library. Most of my users won't read the instructions and always miss the part where they have to set either of those.

I'd like to avoid that and directly set the encoding during the function call. When users clone my project, they already receive files with UTF-8 encoding and there are also some files provided by the community in UTF-8.

And it's easier to tell all users "use UTF-8 encoded YAML files" than to ask them if they're on Windows.

mildebrandt commented 2 years ago

Thanks for outlining your use case. I agree setting the encoding in yamale will work best for you.

mildebrandt commented 2 years ago

Since you're using Yamale as a library, would the following work for you?

import yamale

with open('./189.schema', 'r', encoding='utf-16') as f:
    schema = yamale.make_schema(content=f.read())

with open('./189.yaml', 'r', encoding='utf-8') as f:
    data = yamale.make_data(content=f.read())

yamale.validate(schema, data)

I'm trying to be careful about each additional parameter we add since it does increase the use cases we need to support.

reconman commented 2 years ago

Yes, that works. If you don't want to change Yamale, then you can close this issue.

mildebrandt commented 2 years ago

I'm glad that solution works for you. Thanks!