arup-group / elara

Command line utility for processing MATSim events output files.
MIT License
14 stars 4 forks source link

User warning about sequence matcher #211

Closed mfitz closed 2 years ago

mfitz commented 2 years ago

When I grabbed the latest Elara Docker image and ran the CLI from inside the container I saw the following warning:

$ docker run -it --entrypoint /bin/bash 758645626094.dkr.ecr.eu-west-1.amazonaws.com/elara

root@92d61ece6463:/# elara --help
/usr/local/lib/python3.8/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning
  warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
Usage: elara [OPTIONS] COMMAND [ARGS]...

  Command line tool for processing a MATSim scenario events output.

Options:
  --help  Show this message and exit.

Commands:
  run              Run Elara using a config.
  event-handlers   Access event handler output group.
  plan-handlers    Access plan handler output group.
  post-processors  Access post processing output group.

Some detail on a fix can be found here. However, pip install python-Levenshtein fails inside the container, apparently due to the lack of a gcc installation:

root@92d61ece6463:/# pip install python-Levenshtein
Collecting python-Levenshtein
  Using cached python-Levenshtein-0.12.2.tar.gz (50 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: setuptools in /usr/local/lib/python3.8/site-packages (from python-Levenshtein) (57.5.0)
Building wheels for collected packages: python-Levenshtein
  Building wheel for python-Levenshtein (setup.py) ... error
  error: subprocess-exited-with-error

  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [32 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.8
      creating build/lib.linux-x86_64-3.8/Levenshtein
      copying Levenshtein/StringMatcher.py -> build/lib.linux-x86_64-3.8/Levenshtein
      copying Levenshtein/__init__.py -> build/lib.linux-x86_64-3.8/Levenshtein
      running egg_info
      writing python_Levenshtein.egg-info/PKG-INFO
      writing dependency_links to python_Levenshtein.egg-info/dependency_links.txt
      writing entry points to python_Levenshtein.egg-info/entry_points.txt
      writing namespace_packages to python_Levenshtein.egg-info/namespace_packages.txt
      writing requirements to python_Levenshtein.egg-info/requires.txt
      writing top-level names to python_Levenshtein.egg-info/top_level.txt
      reading manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      warning: no previously-included files matching '*pyc' found anywhere in distribution
      warning: no previously-included files matching '*so' found anywhere in distribution
      warning: no previously-included files matching '.project' found anywhere in distribution
      warning: no previously-included files matching '.pydevproject' found anywhere in distribution
      adding license file 'COPYING'
      writing manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
      copying Levenshtein/_levenshtein.c -> build/lib.linux-x86_64-3.8/Levenshtein
      copying Levenshtein/_levenshtein.h -> build/lib.linux-x86_64-3.8/Levenshtein
      running build_ext
      building 'Levenshtein._levenshtein' extension
      creating build/temp.linux-x86_64-3.8
      creating build/temp.linux-x86_64-3.8/Levenshtein
      gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.8 -c Levenshtein/_levenshtein.c -o build/temp.linux-x86_64-3.8/Levenshtein/_levenshtein.o
      unable to execute 'gcc': No such file or directory
      error: command 'gcc' failed with exit status 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for python-Levenshtein
  Running setup.py clean for python-Levenshtein
Failed to build python-Levenshtein
Installing collected packages: python-Levenshtein
  Running setup.py install for python-Levenshtein ... error
  error: subprocess-exited-with-error

  × Running setup.py install for python-Levenshtein did not run successfully.
  │ exit code: 1
  ╰─> [32 lines of output]
      running install
      running build
      running build_py
      creating build
      creating build/lib.linux-x86_64-3.8
      creating build/lib.linux-x86_64-3.8/Levenshtein
      copying Levenshtein/StringMatcher.py -> build/lib.linux-x86_64-3.8/Levenshtein
      copying Levenshtein/__init__.py -> build/lib.linux-x86_64-3.8/Levenshtein
      running egg_info
      writing python_Levenshtein.egg-info/PKG-INFO
      writing dependency_links to python_Levenshtein.egg-info/dependency_links.txt
      writing entry points to python_Levenshtein.egg-info/entry_points.txt
      writing namespace_packages to python_Levenshtein.egg-info/namespace_packages.txt
      writing requirements to python_Levenshtein.egg-info/requires.txt
      writing top-level names to python_Levenshtein.egg-info/top_level.txt
      reading manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
      reading manifest template 'MANIFEST.in'
      warning: no previously-included files matching '*pyc' found anywhere in distribution
      warning: no previously-included files matching '*so' found anywhere in distribution
      warning: no previously-included files matching '.project' found anywhere in distribution
      warning: no previously-included files matching '.pydevproject' found anywhere in distribution
      adding license file 'COPYING'
      writing manifest file 'python_Levenshtein.egg-info/SOURCES.txt'
      copying Levenshtein/_levenshtein.c -> build/lib.linux-x86_64-3.8/Levenshtein
      copying Levenshtein/_levenshtein.h -> build/lib.linux-x86_64-3.8/Levenshtein
      running build_ext
      building 'Levenshtein._levenshtein' extension
      creating build/temp.linux-x86_64-3.8
      creating build/temp.linux-x86_64-3.8/Levenshtein
      gcc -pthread -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -fPIC -I/usr/local/include/python3.8 -c Levenshtein/_levenshtein.c -o build/temp.linux-x86_64-3.8/Levenshtein/_levenshtein.o
      unable to execute 'gcc': No such file or directory
      error: command 'gcc' failed with exit status 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> python-Levenshtein

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

The solution is probably to install gcc and pip install python-Levenshtein in the Docker image . This may also improve the performance of any fuzzy matching operations in Elara by having them use faster distancing algorithm implementations.

ahmednreldin commented 2 years ago

how i can download the latest image to work on the problem?

i've tried the cmd above but it returned authentication error

docker run -it --entrypoint /bin/bash 758645626094.dkr.ecr.eu-west-1.amazonaws.com/elara

Unable to find image '758645626094.dkr.ecr.eu-west-1.amazonaws.com/elara:latest' locally
docker: Error response from daemon: Head "https://758645626094.dkr.ecr.eu-west-1.amazonaws.com/v2/elara/manifests/latest": no basic auth credentials.
See 'docker run --help'.
mfitz commented 2 years ago

how i can download the latest image to work on the problem?

i've tried the cmd above but it returned authentication error

docker run -it --entrypoint /bin/bash 758645626094.dkr.ecr.eu-west-1.amazonaws.com/elara

Unable to find image '758645626094.dkr.ecr.eu-west-1.amazonaws.com/elara:latest' locally
docker: Error response from daemon: Head "https://758645626094.dkr.ecr.eu-west-1.amazonaws.com/v2/elara/manifests/latest": no basic auth credentials.
See 'docker run --help'.

Hi @ahmednreldin

Sorry for the late reply. You won't be able to pull that image because it's currently private inside our ECR repo, but you can build the image locally directly from the Dockerfile (docker build -t elara-local . should do it) once you've cloned the GitHub repo.

mfitz commented 2 years ago

The warning has been fixed by this commit