domdfcoding / domdf_python_tools

Helpful functions for Pythonโ€‚๐Ÿโ€‚๐Ÿ› ๏ธ
https://domdf-python-tools.readthedocs.io/en/latest
MIT License
6 stars 4 forks source link

Unicode space characters in project metadata description #119

Open wwuck opened 5 months ago

wwuck commented 5 months ago

Description

I am attempting to install flake8-encodings in a poetry project.

My project also uses the following pre-commit hooks:

  - repo: https://github.com/python-poetry/poetry
    rev: 1.8.3
    hooks:
      - id: poetry-lock
        args:
          - --no-update
      - id: poetry-check
        args:
          - --lock
  - repo: https://github.com/sirosen/texthooks
    rev: 0.6.6
    hooks:
      - id: alphabetize-codeowners
      - id: fix-smartquotes
      - id: fix-spaces
      - id: fix-ligatures
      - id: forbid-bidi-controls

When I install flake8-encodings into my poetry project, the poetry-lock pre-commit hook will add domdf-python-tools as a dependency in the poetry.lock lockfile. The problem appears at the next hook when fix-spaces is run and it detects some utf-8 "EN SPACE" (U+2002) characters in the project description and replaces them with standard ascii space (U+0020) characters.

This prevents pre-commit from passing because on the next commit attempt, poetry-lock will detect changes in the lockfile and re-create the lockfile from pypi metadata and bringing back the utf-8 space characters.

Steps to Reproduce

  1. Install domdf-python-tools into a poetry project so the package description appears in poetry.lock.
  2. Run the https://github.com/sirosen/texthooks?tab=readme-ov-file#fix-spaces pre-commit hook.
  3. See that the hook has changed the contents of poetry.lock with git diff.

Actual result:

I copied the description string into a test script to check each character code:

>>> for c in '"Helpful functions for Pythonโ€‚๐Ÿโ€‚๐Ÿ› ๏ธ"':
...   print(f"char '{c}' : {hex(ord(c))}")
... 
char '"' : 0x22
char 'H' : 0x48
char 'e' : 0x65
char 'l' : 0x6c
char 'p' : 0x70
char 'f' : 0x66
char 'u' : 0x75
char 'l' : 0x6c
char ' ' : 0x20
char 'f' : 0x66
char 'u' : 0x75
char 'n' : 0x6e
char 'c' : 0x63
char 't' : 0x74
char 'i' : 0x69
char 'o' : 0x6f
char 'n' : 0x6e
char 's' : 0x73
char ' ' : 0x20
char 'f' : 0x66
char 'o' : 0x6f
char 'r' : 0x72
char ' ' : 0x20
char 'P' : 0x50
char 'y' : 0x79
char 't' : 0x74
char 'h' : 0x68
char 'o' : 0x6f
char 'n' : 0x6e
char 'โ€‚' : 0x2002
char '๐Ÿ' : 0x1f40d
char 'โ€‚' : 0x2002
char '๐Ÿ› ' : 0x1f6e0
char '๏ธ' : 0xfe0f
char '"' : 0x22

Expected result:

Reproduces how often:

Every time installed

Version

Installation source

poetry/pypi