WorksApplications / SudachiDict

A lexicon for Sudachi
233 stars 19 forks source link

pip install stuck at setup.py #43

Closed novastar88 closed 1 year ago

novastar88 commented 1 year ago

pip install sudachidict_full are stucking at setup.py

Is there any workaround?

Windows 10 x64 Python 3.10 Pip last version

eiennohito commented 1 year ago

Can you please post your log of pip -vvvv install sudachidict_full here

I could not reproduce it

novastar88 commented 1 year ago

pip -vvvv install sudachidict_full Succeed after long time, it just took long time to download, but I can't download pip -vvvv install sudachidict-core>=20211220. Downloading taking ages, script just stuck (0% processor usage, 0 Mb/s internet usage). Console output for pip -vvvv install sudachidict-core>=20211220: C:\Users\<username>>pip -vvvv install sudachidict-core>=20211220 Running command python setup.py egg_info Downloading the Sudachi dictionary (It may take a while) ...

I need sudachidict-core>=20211220 for spacy.

Nummulit commented 1 year ago

Hello! I'm having the same exact issue. The installation gets stuck for a good few minutes and then returns

Collecting sudachidict_core
  Downloading SudachiDict-core-20230110.tar.gz (9.0 kB)
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [15 lines of output]
      Downloading the Sudachi dictionary (It may take a while) ...
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/private/var/folders/qd/6qgxd1d52v3886w9rp5spp3h0000gn/T/pip-install-n12nwgbi/sudachidict-core_fa8835758c2346188fefb04d59bf189e/setup.py", line 44, in <module>
          _, _msg = urlretrieve(ZIP_URL, ZIP_NAME)
        File ".../lib/python3.8/urllib/request.py", line 276, in urlretrieve
          block = fp.read(bs)
        File ".../lib/python3.8/http/client.py", line 459, in read
          n = self.readinto(b)
        File ".../lib/python3.8/http/client.py", line 503, in readinto
          n = self.fp.readinto(b)
        File ".../lib/python3.8/socket.py", line 669, in readinto
          return self._sock.recv_into(b)
      ConnectionResetError: [Errno 54] Connection reset by peer
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

(I've changed parts of the paths to ...)

I'm running this on an Apple Sillicon CPU, so perhaps that's what causing the problem, but I don't see how it could 🤔

Nummulit commented 1 year ago

I'm now trying to download the files manually, but the speeds are around 500 bytes per second... or they fail to download at all. Perhaps the issue is with the server hosting them? That should explain the timeouts. Could you please confirm or deny that?

eiennohito commented 1 year ago

It downloads a dictionary from an AWS S3 bucket, which should be pretty fast if you have a fast internet connection.

eiennohito commented 1 year ago

Maybe it is because the dictionary is downloaded from Tokyo S3 region at the moment and downloads from Tokyo region can be slow from outside Japan. @Nummulit @novastar88 do you know the closest S3 regions to you?

Nummulit commented 1 year ago

For me that would Frankfurt or Stockholm. I've checked ping across regions using https://www.cloudping.info/ and while Tokyo is indeed slower (just under 300ms) it shouldn't be that big of a problem.

I've managed to download the files manually, extracted and put them into the src/main/text, and tried running pip install ./python/ (from the sudadict directory) to install the package manually, but that also doesn't seem to work. I got a following error:

Processing ./python
  Preparing metadata (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [21 lines of output]
      Downloading the Sudachi dictionary (It may take a while) ...
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File ".../SudachiDict/python/setup.py", line 44, in <module>
          _, _msg = urlretrieve(ZIP_URL, ZIP_NAME)
        File ".../lib/python3.8/urllib/request.py", line 247, in urlretrieve
          with contextlib.closing(urlopen(url, data)) as fp:
        File ".../lib/python3.8/urllib/request.py", line 222, in urlopen
          return opener.open(url, data, timeout)
        File ".../lib/python3.8/urllib/request.py", line 531, in open
          response = meth(req, response)
        File ".../lib/python3.8/urllib/request.py", line 640, in http_response
          response = self.parent.error(
        File ".../lib/python3.8/urllib/request.py", line 569, in error
          return self._call_chain(*args)
        File ".../lib/python3.8/urllib/request.py", line 502, in _call_chain
          result = func(*args)
        File ".../lib/python3.8/urllib/request.py", line 649, in http_error_default
          raise HTTPError(req.full_url, code, msg, hdrs, fp)
      urllib.error.HTTPError: HTTP Error 400: Invalid URI: isHexDigit
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

Looking into the setup.py it seem like the data is expected to be different place (based on the RESOURCE_DIR variable). Perhaps I should put the downloaded archives elsewhere? :)

eiennohito commented 1 year ago

python directory needs some cooking before it becomes usable, it is a template of a package, not the package contents. See https://github.com/WorksApplications/SudachiDict/blob/develop/.github/workflows/release_python.yml#L27

Anyway, I still can't reproduce the issue locally :(, with Python 3.10 (fresh venv from Ubuntu 22.04 system Python, with wheel installed and not)

eiennohito commented 1 year ago

Anyway, we will experiment with changing how the package is distributed for the next dictionary update.

Nummulit commented 1 year ago

I've managed to overcome this by using a VPN to connect to server in Japan.

This isn't definitely the ideal solution, but might help others :)

eiennohito commented 1 year ago

Noted on connection problems to Tokyo region.

graftim commented 1 year ago

I have the same issue. Just takes ages to install, or eventually fails. Sadly, I can't use a VPN on my desired device.

masasasann commented 1 year ago

Hello, I had the same issue as @graftim and @novastar88, and we managed to solve it in our environment. The reason for the installation failure was that the setup.py file was using http protocols instead of https. We noticed that we had restricted outbound traffic on port 80 in our AWS development environment. After adding a security group rule to our AWS VPC, I was able to install the package successfully. I'm not sure if this method will be helpful, but I'll share it with you.

eiennohito commented 1 year ago

Interesting information on http/https protocols, thanks. We will move distribution to PyPI nevertheless, I think, sorry PyPI for additional traffic.

Manoa0404 commented 1 year ago

Thank you for your confirmation @eiennohito . I also have same issue and I have 2 network conditions. One(home & office). One condition could install but other condition had same issue with @Nummulit . If you have any countermeasures, let us know.

eiennohito commented 1 year ago

Hi everyone, can you please check if pip install sudachidict-core==20230711-rc1 work better for you than before. This release uses CDN for distributing the dictionary files instead of S3 in Tokyo AWS region.

Getting rid of

  DEPRECATION: sudachidict-core is being installed using the legacy 'setup.py install' method, because it does not have a 'pyproject.toml' and the 'wheel' package is not installed. pip 23.1 will enforce this behaviour change. A possible replacement is to enable the '--use-pep517' option. Discussion can be found at https://github.com/pypa/pip/issues/8559

is in progress

eiennohito commented 1 year ago

@Manoa0404 @masasasann @Nummulit

eiennohito commented 1 year ago

20230711 release now uses CDN ( https://d2ej7fkh96fzlu.cloudfront.net/ ) for downloads and small, core dictionary also has wheels uploaded to PyPI (full wheel is >100mb so it is not uploaded to PyPI)

You can also try to install dictionary from Github releases if everything fails https://github.com/WorksApplications/SudachiDict/releases/tag/v20230711

Closing this issue, please reopen if problems persist.