adrienverge / yamllint

A linter for YAML files.
GNU General Public License v3.0
2.9k stars 278 forks source link

Update cli.py: encoding='utf-8' #696

Open BaseMax opened 2 weeks ago

BaseMax commented 2 weeks ago

The issue happened in our project at https://github.com/SalamLang/Salam/issues/265 in pre-commit for lining YAML files.

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 445: character maps to

coveralls commented 2 weeks ago

Coverage Status

coverage: 99.825%. remained the same when pulling 6ca01fe303698f3849167147bdd628c548f07888 on MaxFork:supports-utf8 into 95e17b33dc147c9c35742853e17bdf3fd508550b on adrienverge:master.

BaseMax commented 2 weeks ago

cc @adrienverge @jbampton

adrienverge commented 2 weeks ago

Hello and thanks for the proposal. Could you check out other pull requests related to character encoding? How does this one differ from them?

BaseMax commented 2 weeks ago

Hi @adrienverge, happy connecting.

There are total 3 merge requests related to encoding. 1- #630 2- #240 3- #696 (CURRENT MERGE REQUEST)

The https://github.com/adrienverge/yamllint/pull/630/files#diff-2e0288fc9fc3cda09f90a25f76bedb9ce0cea019d01147b436e575c71a3e674eR222 merge request looks fine but it doesn't have the change I applied.

My problem is that I have Persian UTF8 text in my YAML files and the problem was related to the 'cli.py' file.

Related to my issue https://github.com/adrienverge/yamllint/pull/240/files looks like a good patch as it can automatically detect the encoding and then use that in reading the file but I can see your comments there and it seems you are not happy to add new dependencies. Q: "I'm very against adding dependencies (like chardet)."

adrienverge commented 2 weeks ago

Hello Max, thanks. It looks like https://github.com/adrienverge/yamllint/pull/630 solves the same problem but is more complete and future-proof. Also, your PR doesn't fix encoding problems for other files such as configuration. What do you think?

My problem is that I have Persian UTF8 text in my YAML files and the problem was related to the 'cli.py' file.

In the meantime, a solution is to tell Python to read files as UTF-8 by default:

export PYTHONUTF8=1
yamllint your-file.yaml
BaseMax commented 2 weeks ago

Thank you @adrienverge, I added PYTHONUTF8 var to our pre-commit env config. https://github.com/SalamLang/Salam/commit/db7e870e233f9997e765ff00e7af91b515d6ef2a

@jbampton and I will do more testing.