commitizen-tools / commitizen

Create committing rules for projects :rocket: auto bump versions :arrow_up: and auto changelog generation :open_file_folder:
https://commitizen-tools.github.io/commitizen/
MIT License
2.39k stars 257 forks source link

commitizen_emoji cannot write to CHANGELOG #516

Open adam-grant-hendry opened 2 years ago

adam-grant-hendry commented 2 years ago

Description

This is because commitizen_emoji uses unicode emoji characters, but commitizen does not specify the file encoding in write_changelog(), so it defaults back to cp1252:

https://github.com/commitizen-tools/commitizen/blob/79165119817cbd0aef25434a85dcbffa3a13acbb/commitizen/commands/changelog.py#L92

This could be fixed by specifying the utf-8 encoding:

with open(self.file_name, "w", encoding='utf-8') as changelog_file: 
   ...

Steps to reproduce

> cz bump --changelog
bump: version 0.4.0 → 0.5.0
tag to create: 0.5.0       
increment detected: MINOR  

Traceback (most recent call last):
  File "C:\Program Files\Python38\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Program Files\Python38\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\hendra11\Code\external\cz-emoji\.venv\Scripts\cz.exe\__main__.py", line 7, in <module>
  File "C:\Users\hendra11\Code\external\cz-emoji\.venv\lib\site-packages\commitizen\cli.py", line 382, in main
    args.func(conf, vars(args))()
  File "C:\Users\hendra11\Code\external\cz-emoji\.venv\lib\site-packages\commitizen\commands\bump.py", line 215, in __call__
    changelog_cmd()
  File "C:\Users\hendra11\Code\external\cz-emoji\.venv\lib\site-packages\commitizen\commands\changelog.py", line 172, in __call__
    self.write_changelog(changelog_out, lines, changelog_meta)
  File "C:\Users\hendra11\Code\external\cz-emoji\.venv\lib\site-packages\commitizen\commands\changelog.py", line 103, in write_changelog
    changelog_file.write(changelog_out)
  File "C:\Program Files\Python38\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f389' in position 29: character maps to <undefined>

Current behavior

See Steps to reproduce

Desired behavior

Emojis can be logged to changelog

Screenshots

No response

Environment

Lee-W commented 2 years ago

Maybe we could make the encoding configurable?

adam-grant-hendry commented 2 years ago

'utf-8' should cover all encodings, but I'm fine with that. You'd have to change BaseCommitizen to grab that config variable and everywhere you have an open used in your codebase, change it to add the encoding.

fabiopedrosa commented 2 years ago

I seen the same issue on my end, and adding the encoding='utf-8' fixed it.

Lee-W commented 2 years ago

Hi @adam-grant-hendry we've release a new version of commitizen. Could you please try whether 2.30.0 solve this issue?

Yusin0903 commented 2 years ago

After I test it, the problem still exists.

adam-grant-hendry commented 1 year ago

@Lee-W The problem still persists. I tried on 2.35.0.

You need to specify the utf-8 encoding where open() is used. That is how this can be fixed and it was additionally confirmed by @fabiopedrosa

adam-grant-hendry commented 1 year ago

@Lee-W I remember now why I never submitted a PR: I am on Windows and your scripts/test and scripts/format expects /bin/sh, so I can't pass the pre-commit step linter and test.

This can be solved by using Git Bash: Git for Windows ships with git bash (i.e. a sh.exe and bash.exe).

This is another issue and requires another PR to fix.

adam-grant-hendry commented 1 year ago

This can be solved by using Git Bash: Git for Windows ships with git bash (i.e. a sh.exe and bash.exe).

Actually, even easier than that: Just switch to using a portable shebang:

#!/usr/bin/env sh
Lee-W commented 1 year ago

Hi @adam-grant-hendry , I notice you create another issue regarding to it. If pre-commit does not work, could you please use git commit ... --no-verify and git push ... --no-verify as a temporary solution before we can sort the /bin/sh issue out?

Lee-W commented 1 year ago

It would be even more helpful if you could document this on the doc for windows users. Thanks!

adam-grant-hendry commented 1 year ago

@woile @Lee-W I have a working implementation with passing unit tests for Unicode characters (that is, emojis). Once PR #605 is merged, I'll push it up.

adam-grant-hendry commented 1 year ago

It would be even more helpful if you could document this on the doc for windows users. Thanks!

Yes, great idea. I'll make another PR for docs. I can probably also resolve Issue #515 there as well. I spewed a bunch of comments there and have had some time to learn and practice, so I might clean that up a little bit and clarify things.