aboSamoor / polyglot

Multilingual text (NLP) processing toolkit
http://polyglot-nlp.com
Other
2.28k stars 337 forks source link

Polyglot install fails on pip 24.1 and above. #279

Open bnsh opened 1 week ago

bnsh commented 1 week ago

Polyglot seems not to build with pip version 24.1 and above. It works with version 24.0...

I narrowed down the problem to this nested function within pip: untar_file/pip_filter. I haven't quite figured out the intent behind pip_filter. But it seems to have been introduced in 24.1. Basically, it causes tarfile to fail at line TarFile::_find_link_target. I'm not sure of the intent of either of these two functions, so I'm not sure how to fix it. (I'm not even sure if this is Polyglot's problem or Pip's problem, unfortunately, so I'm posting an issue on both. The issue on pip is here.)

  1. take all the files at this gist.
  2. run make 24.0
  3. Observe that it succeeds.
  4. run make 24.1
  5. Observe that it fails.
  6. run make 24.1-venv
  7. Observe that it also fails. (venv just builds a virtual env first)
binesh@p1gen2:/tmp/replicate/issue$ make 24.0
docker build --no-cache -f Dockerfile-24.0 -t binesh/polyglot_pip_problem24.0 .
DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
            Install the buildx component to build images with BuildKit:
            https://docs.docker.com/go/buildx/

Sending build context to Docker daemon  5.632kB
Step 1/4 : FROM python:3.12
 ---> a3aef63c6c10
Step 2/4 : RUN python3 -m pip install -U pip==24.0
 ---> Running in 4f892e92f4de
Requirement already satisfied: pip==24.0 in /usr/local/lib/python3.12/site-packages (24.0)
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

[notice] A new release of pip is available: 24.0 -> 24.1.1
[notice] To update, run: pip install --upgrade pip
Removing intermediate container 4f892e92f4de
 ---> adc91782dedb
Step 3/4 : RUN python3 -m pip install -U polyglot
 ---> Running in 0b5d0b673953
Collecting polyglot
  Downloading polyglot-16.7.4.tar.gz (126 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 126.3/126.3 kB 3.2 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: polyglot
  Building wheel for polyglot (setup.py): started
  Building wheel for polyglot (setup.py): finished with status 'done'
  Created wheel for polyglot: filename=polyglot-16.7.4-py2.py3-none-any.whl size=52558 sha256=9495afdff0ec436d3c3d0e0f4f5987edad8f591fc00ba3f6f150194f27d0afd4
  Stored in directory: /root/.cache/pip/wheels/c7/5e/28/47349211ec1f91379f41ed10bc2520f7071ecfb6cbe182f6fe
Successfully built polyglot
Installing collected packages: polyglot
Successfully installed polyglot-16.7.4
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

[notice] A new release of pip is available: 24.0 -> 24.1.1
[notice] To update, run: pip install --upgrade pip
Removing intermediate container 0b5d0b673953
 ---> a53e790527c4
Step 4/4 : RUN python3 -m pip --version
 ---> Running in 45c5292dfb33
pip 24.0 from /usr/local/lib/python3.12/site-packages/pip (python 3.12)
Removing intermediate container 45c5292dfb33
 ---> 7f80b34cf713
Successfully built 7f80b34cf713
Successfully tagged binesh/polyglot_pip_problem24.0:latest
binesh@p1gen2:/tmp/replicate/issue$ echo "WOOT Success."
WOOT Success.
binesh@p1gen2:/tmp/replicate/issue$ make 24.1
docker build --no-cache -f Dockerfile-24.1 -t binesh/polyglot_pip_problem24.1 .
DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
            Install the buildx component to build images with BuildKit:
            https://docs.docker.com/go/buildx/

Sending build context to Docker daemon  5.632kB
Step 1/4 : FROM python:3.12
 ---> a3aef63c6c10
Step 2/4 : RUN python3 -m pip install -U pip==24.1
 ---> Running in 60133d3d667a
Collecting pip==24.1
  Downloading pip-24.1-py3-none-any.whl.metadata (3.6 kB)
Downloading pip-24.1-py3-none-any.whl (1.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 8.2 MB/s eta 0:00:00
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.0
    Uninstalling pip-24.0:
      Successfully uninstalled pip-24.0
Successfully installed pip-24.1
WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv

[notice] A new release of pip is available: 24.1 -> 24.1.1
[notice] To update, run: pip install --upgrade pip
Removing intermediate container 60133d3d667a
 ---> ce8d15e02b08
Step 3/4 : RUN python3 -m pip install -U polyglot
 ---> Running in 9b62b4f7460e
Collecting polyglot
  Downloading polyglot-16.7.4.tar.gz (126 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 126.3/126.3 kB 5.1 MB/s eta 0:00:00
ERROR: Exception:
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/site-packages/pip/_internal/cli/base_command.py", line 179, in exc_logging_wrapper
    status = run_func(*args)
             ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pip/_internal/cli/req_command.py", line 67, in wrapper
    return func(self, options, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pip/_internal/commands/install.py", line 377, in run
    requirement_set = resolver.resolve(
                      ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 95, in resolve
    result = self._result = resolver.resolve(
                            ^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pip/_vendor/resolvelib/resolvers.py", line 546, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pip/_vendor/resolvelib/resolvers.py", line 397, in resolve
    self._add_to_criteria(self.state.criteria, r, parent=None)
  File "/usr/local/lib/python3.12/site-packages/pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria
    if not criterion.candidates:
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pip/_vendor/resolvelib/structs.py", line 156, in __bool__
    return bool(self._sequence)
           ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 174, in __bool__
    return any(self)
           ^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 162, in <genexpr>
    return (c for c in iterator if id(c) not in self._incompatible_ids)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 53, in _iter_built
    candidate = func()
                ^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 185, in _make_candidate_from_link
    base: Optional[BaseCandidate] = self._make_base_candidate_from_link(
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 231, in _make_base_candidate_from_link
    self._link_candidate_cache[link] = LinkCandidate(
                                       ^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 303, in __init__
    super().__init__(
  File "/usr/local/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 158, in __init__
    self.dist = self._prepare()
                ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 235, in _prepare
    dist = self._prepare_distribution()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 314, in _prepare_distribution
    return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pip/_internal/operations/prepare.py", line 527, in prepare_linked_requirement
    return self._prepare_linked_requirement(req, parallel_builds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pip/_internal/operations/prepare.py", line 598, in _prepare_linked_requirement
    local_file = unpack_url(
                 ^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pip/_internal/operations/prepare.py", line 180, in unpack_url
    unpack_file(file.path, location, file.content_type)
  File "/usr/local/lib/python3.12/site-packages/pip/_internal/utils/unpacking.py", line 316, in unpack_file
    untar_file(filename, location)
  File "/usr/local/lib/python3.12/site-packages/pip/_internal/utils/unpacking.py", line 235, in untar_file
    tar.extractall(location, filter=pip_filter)
  File "/usr/local/lib/python3.12/tarfile.py", line 2269, in extractall
    self._extract_one(tarinfo, path, set_attrs=not tarinfo.isdir(),
  File "/usr/local/lib/python3.12/tarfile.py", line 2332, in _extract_one
    self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
  File "/usr/local/lib/python3.12/tarfile.py", line 2423, in _extract_member
    self.makelink(tarinfo, targetpath)
  File "/usr/local/lib/python3.12/tarfile.py", line 2521, in makelink
    self._extract_member(self._find_link_target(tarinfo),
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/tarfile.py", line 2725, in _find_link_target
    raise KeyError("linkname %r not found" % linkname)
KeyError: "linkname 'polyglot-16.7.4/docs/README.rst' not found"

[notice] A new release of pip is available: 24.1 -> 24.1.1
[notice] To update, run: pip install --upgrade pip
The command '/bin/sh -c python3 -m pip install -U polyglot' returned a non-zero code: 2
make: *** [Makefile:7: 24.1] Error 2
binesh@p1gen2:/tmp/replicate/issue$ echo "Frowny face."
Frowny face.
binesh@p1gen2:/tmp/replicate/issue$ make 24.1-venv
docker build --no-cache -f Dockerfile-24.1-venv -t binesh/polyglot_pip_problem24.1 .
DEPRECATED: The legacy builder is deprecated and will be removed in a future release.
            Install the buildx component to build images with BuildKit:
            https://docs.docker.com/go/buildx/

Sending build context to Docker daemon  6.656kB
Step 1/5 : FROM python:3.12
 ---> a3aef63c6c10
Step 2/5 : RUN python3 -m venv /tmp/venv
 ---> Running in 293518f611b3
Removing intermediate container 293518f611b3
 ---> e682a6d8fe6f
Step 3/5 : RUN /tmp/venv/bin/python3 -m pip install -U pip==24.1
 ---> Running in 1db8604cbb97
Collecting pip==24.1
  Downloading pip-24.1-py3-none-any.whl.metadata (3.6 kB)
Downloading pip-24.1-py3-none-any.whl (1.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 8.6 MB/s eta 0:00:00
Installing collected packages: pip
  Attempting uninstall: pip
    Found existing installation: pip 24.0
    Uninstalling pip-24.0:
      Successfully uninstalled pip-24.0
Successfully installed pip-24.1

[notice] A new release of pip is available: 24.1 -> 24.1.1
[notice] To update, run: python3 -m pip install --upgrade pip
Removing intermediate container 1db8604cbb97
 ---> 72de3b8f3f78
Step 4/5 : RUN /tmp/venv/bin/python3 -m pip install -U polyglot
 ---> Running in ebc4ccc0d7e1
Collecting polyglot
  Downloading polyglot-16.7.4.tar.gz (126 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 126.3/126.3 kB 5.9 MB/s eta 0:00:00
ERROR: Exception:
Traceback (most recent call last):
  File "/tmp/venv/lib/python3.12/site-packages/pip/_internal/cli/base_command.py", line 179, in exc_logging_wrapper
    status = run_func(*args)
             ^^^^^^^^^^^^^^^
  File "/tmp/venv/lib/python3.12/site-packages/pip/_internal/cli/req_command.py", line 67, in wrapper
    return func(self, options, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/venv/lib/python3.12/site-packages/pip/_internal/commands/install.py", line 377, in run
    requirement_set = resolver.resolve(
                      ^^^^^^^^^^^^^^^^^
  File "/tmp/venv/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 95, in resolve
    result = self._result = resolver.resolve(
                            ^^^^^^^^^^^^^^^^^
  File "/tmp/venv/lib/python3.12/site-packages/pip/_vendor/resolvelib/resolvers.py", line 546, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/venv/lib/python3.12/site-packages/pip/_vendor/resolvelib/resolvers.py", line 397, in resolve
    self._add_to_criteria(self.state.criteria, r, parent=None)
  File "/tmp/venv/lib/python3.12/site-packages/pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria
    if not criterion.candidates:
           ^^^^^^^^^^^^^^^^^^^^
  File "/tmp/venv/lib/python3.12/site-packages/pip/_vendor/resolvelib/structs.py", line 156, in __bool__
    return bool(self._sequence)
           ^^^^^^^^^^^^^^^^^^^^
  File "/tmp/venv/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 174, in __bool__
    return any(self)
           ^^^^^^^^^
  File "/tmp/venv/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 162, in <genexpr>
    return (c for c in iterator if id(c) not in self._incompatible_ids)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/venv/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/found_candidates.py", line 53, in _iter_built
    candidate = func()
                ^^^^^^
  File "/tmp/venv/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 185, in _make_candidate_from_link
    base: Optional[BaseCandidate] = self._make_base_candidate_from_link(
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/venv/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/factory.py", line 231, in _make_base_candidate_from_link
    self._link_candidate_cache[link] = LinkCandidate(
                                       ^^^^^^^^^^^^^^
  File "/tmp/venv/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 303, in __init__
    super().__init__(
  File "/tmp/venv/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 158, in __init__
    self.dist = self._prepare()
                ^^^^^^^^^^^^^^^
  File "/tmp/venv/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 235, in _prepare
    dist = self._prepare_distribution()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/venv/lib/python3.12/site-packages/pip/_internal/resolution/resolvelib/candidates.py", line 314, in _prepare_distribution
    return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/venv/lib/python3.12/site-packages/pip/_internal/operations/prepare.py", line 527, in prepare_linked_requirement
    return self._prepare_linked_requirement(req, parallel_builds)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/venv/lib/python3.12/site-packages/pip/_internal/operations/prepare.py", line 598, in _prepare_linked_requirement
    local_file = unpack_url(
                 ^^^^^^^^^^^
  File "/tmp/venv/lib/python3.12/site-packages/pip/_internal/operations/prepare.py", line 180, in unpack_url
    unpack_file(file.path, location, file.content_type)
  File "/tmp/venv/lib/python3.12/site-packages/pip/_internal/utils/unpacking.py", line 316, in unpack_file
    untar_file(filename, location)
  File "/tmp/venv/lib/python3.12/site-packages/pip/_internal/utils/unpacking.py", line 235, in untar_file
    tar.extractall(location, filter=pip_filter)
  File "/usr/local/lib/python3.12/tarfile.py", line 2269, in extractall
    self._extract_one(tarinfo, path, set_attrs=not tarinfo.isdir(),
  File "/usr/local/lib/python3.12/tarfile.py", line 2332, in _extract_one
    self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
  File "/usr/local/lib/python3.12/tarfile.py", line 2423, in _extract_member
    self.makelink(tarinfo, targetpath)
  File "/usr/local/lib/python3.12/tarfile.py", line 2521, in makelink
    self._extract_member(self._find_link_target(tarinfo),
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/tarfile.py", line 2725, in _find_link_target
    raise KeyError("linkname %r not found" % linkname)
KeyError: "linkname 'polyglot-16.7.4/docs/README.rst' not found"

[notice] A new release of pip is available: 24.1 -> 24.1.1
[notice] To update, run: python3 -m pip install --upgrade pip
The command '/bin/sh -c /tmp/venv/bin/python3 -m pip install -U polyglot' returned a non-zero code: 2
make: *** [Makefile:10: 24.1-venv] Error 2
binesh@p1gen2:/tmp/replicate/issue$ echo "heavy sigh"
heavy sigh
binesh@p1gen2:/tmp/replicate/issue$