abravalheri / validate-pyproject

Validation library for simple check on `pyproject.toml`
https://validate-pyproject.readthedocs.io/
Mozilla Public License 2.0
129 stars 12 forks source link

`_find_and_load_licence()` in pre-compile might be unsafe #50

Open mgorny opened 2 years ago

mgorny commented 2 years ago

Basically, what I'm thinking of is:

>>> [f for f in importlib.metadata.files("validate-pyproject") if f.stem == 'LICENSE']
[PackagePath('validate_pyproject-0.9.post1.dev3+g3b0db8c.dist-info/LICENSE.txt'),
 PackagePath('validate_pyproject/_vendor/fastjsonschema/LICENSE')]

i.e. both the package license file and the vendored fastjsonschema's LICENSE file matches this:

def _find_and_load_licence(files: Optional[Sequence[_M.PackagePath]]) -> str:
    if files is None:  # pragma: no cover
        raise ImportError("Could not find LICENSE for package")
    try:
        return next(f for f in files if f.stem.upper() == "LICENSE").read_text("UTF-8")

and I think it might be UB that the license file from dist-info is returned first.

That said, in Gentoo we remove the LICENSE* files from .dist-info since they are redundant to our license metadata, so this doesn't work correctly at all. Could you perhaps install the license file directly as part of package data, and use it similarly to how FJS's license is grabbed?

abravalheri commented 2 years ago

and I think it might be UB that the license file from dist-info is returned first.

I think when I was developing this, I noted the results in importlib.metadata.files are always given in "file system topologic order", so the license would come first... I may change this to make it more robust.

That said, in Gentoo we remove the LICENSE* files from .dist-info since they are redundant to our license metadata, so this doesn't work correctly at all.

Could we defer this decision until after PEP 639 is approved or rejected 😅? What would be the plan of Gentoo to handle that standard?

PEP 639 (if approved) will explicitly make it reliable for packages to query for its own license files. So far the text of the PEP reads like the following (emphasis are mine):

If a License-File is listed in a source or built distribution’s core metadata, that file MUST be included in the distribution at the specified path relative to the root license directory, and MUST be installed with the distribution at that same relative path.

The specified relative path MUST be consistent between project source trees, source distributions (sdists), built distributions (wheels) and installed projects. Therefore, inside the root license directory, packaging tools MUST reproduce the directory structure under which the source license files are located relative to the project root.

Root License Directory (Short: License Directory) The directory under which license files are stored in a project/distribution and the root directory that their paths, as recorded under the License-File core metadata fields, are relative to. Defined here to be the project root directory for source trees and source distributions, and a subdirectory named license_files of the directory containing the core metadata (i.e., the .dist-info/license_files directory) for built distributions and installed projects.

mgorny commented 2 years ago

Sounds like another random "change everything every half of year" standard that's pushed with zero consideration for world outside virtualenvs and that doesn't care about anyone who doesn't have time to read all the new PEPs proposed every month.

abravalheri commented 2 years ago

Sounds like another random "change everything every half of year" standard that's pushed with zero consideration for world outside virtualenvs and that doesn't care about anyone who doesn't have time to read all the new PEPs proposed every month.

Luckily this one has been proposed and is under discussion for over 2 years now, so the community had some time to engage. Please feel free to add your thoughts to https://discuss.python.org/t/pep-639-round-2-improving-license-clarity-with-better-package-metadata/12622/76.

mgorny commented 2 years ago

If only I had hours of free time to read the existing thread… unless someone summarized it?

stanislavlevin commented 1 year ago

In ALTLinux we deduplicate common LICENSE files. Recent debundle of fastjsonschema in 371ec7dfcbabe6fbe6641816e86dda9207109e50 broke us:

[builder@localhost test]$ python3 -m validate_pyproject.pre_compile -vv
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.distutils` schema
[INFO] validate_pyproject.api.load_builtin_plugin defines `tool.setuptools` schema
[ERROR] ImportError: Could not find LICENSE for package

[DEBUG] Please check the following information:
Traceback (most recent call last):
  File "/usr/src/.local/lib/python3/site-packages/validate_pyproject/cli.py", line 185, in exceptions2exit
    yield
  File "/usr/lib64/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/usr/src/.local/lib/python3/site-packages/validate_pyproject/pre_compile/cli.py", line 91, in run
    pre_compile(prms.output_dir, prms.main_file, cmd, prms.plugins, prms.replacements)
  File "/usr/src/.local/lib/python3/site-packages/validate_pyproject/pre_compile/__init__.py", line 58, in pre_compile
    write_notice(out, main_file, original_cmd, replacements)
  File "/usr/src/.local/lib/python3/site-packages/validate_pyproject/pre_compile/__init__.py", line 105, in write_notice
    notice = notice.format(notice=opening, main_file=main_file, **load_licenses())
  File "/usr/src/.local/lib/python3/site-packages/validate_pyproject/pre_compile/__init__.py", line 115, in load_licenses
    "fastjsonschema_license": _find_and_load_licence(_M.files("fastjsonschema")),
  File "/usr/src/.local/lib/python3/site-packages/validate_pyproject/pre_compile/__init__.py", line 134, in _find_and_load_licence
    raise ImportError("Could not find LICENSE for package")
ImportError: Could not find LICENSE for package
hroncok commented 1 year ago

In Fedora RPM packages, we keep the LICENSE files, but we drop RECORDs, which is allowed by PEP 627 and the current PyPA specification as well. The code here breaks because of that.

To reproduce, run:

(venv) [tmp]$ pip install validate_pyproject
...
Successfully installed fastjsonschema-2.16.3 validate_pyproject-0.12.1

(venv) [tmp]$ python -c 'import validate_pyproject.pre_compile; validate_pyproject.pre_compile.load_licenses()'

(venv) [tmp]$ rm venv/lib/python3.*/site-packages/*.dist-info/RECORD 

(venv) [tmp]$ python -c 'import validate_pyproject.pre_compile; validate_pyproject.pre_compile.load_licenses()'
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File ".../venv/lib64/python3.11/site-packages/validate_pyproject/pre_compile/__init__.py", line 115, in load_licenses
    "fastjsonschema_license": _find_and_load_licence(_M.files("fastjsonschema")),
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../venv/lib64/python3.11/site-packages/validate_pyproject/pre_compile/__init__.py", line 134, in _find_and_load_licence
    raise ImportError("Could not find LICENSE for package")
ImportError: Could not find LICENSE for package

importlib.metadtata.files docs mention this:

In the case where the metadata file listing files (RECORD or SOURCES.txt) is missing, files() will return None.