CambridgeMolecularEngineering / chemdataextractor2

ChemDataExtractor Version 2.0
Other
120 stars 28 forks source link

Dependencies version break installation. #31

Open rccc opened 1 year ago

rccc commented 1 year ago

Hello,

Installation is broken due to dependencies resolution problem:

I have to install hundreds of boto3, botocore and flask lib due to this known error :

"pip is looking at multiple versions of flask to determine which version is compatible with other requirements. This could take a while."

Idem for Boto3 and botcore.

The best to do for scientists who plays with tons of dependencies is to provide a docker image. You would have seen this problem, you would not have let users download the whole internet to have CDE2 running.

Sorry for sarcasm.

shohnmccullough commented 1 year ago

Have you also run into the circular issue of urllib3 dependency conflicts where version >= 1.25.10 is required by requests, but requests also requires urllib3 < 1.25 && urllib >= 1.21.1?

rccc commented 1 year ago

Hello,

Yes i had such kind of insolvable depencies conflicts.

I did not finish the install, too much lib version to download. I will wait, maybe in few months the problem will be solved, or maybe they will share a pre-installed docker image.

shohnmccullough commented 1 year ago

Hello,

Yes i had such kind of insolvable depencies conflicts.

I did not finish the install, too much lib version to download. I will wait, maybe in few months the problem will be solved, or maybe they will share a pre-installed docker image.

I made this and it has consistently worked for me -

Make sure you have at least 35GB disk space for the docker image (the pip install is VERY inefficient), also note the pip install alone takes 5500+ seconds.

FROM --platform=linux/amd64 ubuntu:latest

WORKDIR /app/tmp

RUN apt-get update && \ apt-get install autoconf build-essential curl gcc libncurses5-dev libncursesw5-dev \ libreadline-dev libffi-dev libsqlite3-dev libssl-dev libtool libbz2-dev \ libxml2 llvm make nano openssl xz-utils wget zlib1g-dev -y

prerequisites for Python3.8 install

RUN apt-get install software-properties-common -y && \ add-apt-repository ppa:deadsnakes/ppa

install base python3.8, python3.8-dev, and python3.8-distutils separately because they don't seem to be included properly in -full

RUN DEBIAN_FRONTEND=noninteractive apt-get install python3.8 -y --no-install-recommends && \ DEBIAN_FRONTEND=noninteractive apt-get install python3.8-dev -y --no-install-recommends && \ DEBIAN_FRONTEND=noninteractive apt-get install python3.8-distutils -y --no-install-recommends

install pip for symlinked 3.8

RUN wget https://bootstrap.pypa.io/get-pip.py -O - | python3.8

RUN pip install chemdataextractor2 && \ pip install numpy==1.20.3

ensure models are downloaded

RUN cde

CMD ["/bin/bash", "-c", "bash"]

rccc commented 1 year ago

@shohnmccullough Thanks for sharing this !

35GB disk space is huge, i wonder if this is due to https://github.com/pypa/pip/issues/8713 (see comment here too https://github.com/pypa/pip/issues/9284#issuecomment-800843707)

shohnmccullough commented 1 year ago

I do believe that is exactly the issue, it looks like the BOTO* packages are the main offenders. Pip starts at the most recent it can see and downloads versions until it finds one that matches.

OBrink commented 1 year ago

@rccc @shohnmccullough

I have found a way to generate a smaller image (7.45 GB). Without the --no-cache-dir flag, it's 15.5 GB. I'll open a pull request to try to add this to the repository. Building the image takes approximately one hour.

FROM python:3.8-buster

RUN pip install --no-cache-dir chemdataextractor2
RUN pip install --no-cache-dir "numpy<1.24.0"

RUN cde

CMD ["/bin/bash", "-c", "bash"]

I have uploaded the image along with some documentation here.

Dingyun-Huang commented 10 months ago

The dependency issue is fixed in the newest version 2.2.1. I tested on a conda virtual environment in Ubuntu 22.04. The disk space for the environment is around 8 GB and the ML models cost 1.1 GB. Please try installing in a fresh environment to avoid version conflicts.

rccc commented 7 months ago

@rccc @shohnmccullough

I have found a way to generate a smaller image (7.45 GB). Without the --no-cache-dir flag, it's 15.5 GB. I'll open a pull request to try to add this to the repository. Building the image takes approximately one hour.

FROM python:3.8-buster

RUN pip install --no-cache-dir chemdataextractor2
RUN pip install --no-cache-dir "numpy<1.24.0"

RUN cde

CMD ["/bin/bash", "-c", "bash"]

I have uploaded the image along with some documentation here.

Thanks a lot !

AhmetTasdemir commented 7 months ago

@rccc @shohnmccullough

I have found a way to generate a smaller image (7.45 GB). Without the --no-cache-dir flag, it's 15.5 GB. I'll open a pull request to try to add this to the repository. Building the image takes approximately one hour.

FROM python:3.8-buster

RUN pip install --no-cache-dir chemdataextractor2
RUN pip install --no-cache-dir "numpy<1.24.0"

RUN cde

CMD ["/bin/bash", "-c", "bash"]

I have uploaded the image along with some documentation here.

image

@OBrink I get an error like this, the folder appears empty every time I try. Thank you for your effort, I would be very grateful if you can help.

sukiluvcode commented 5 months ago

@AhmetTasdemir Hello, AhmetTasdemir, have you bind your cde2 directory with the container? First change your working directory to cde2. If your OS system is mac/linux, run

docker run --mount type=bind,source=$(pwd)/chemdataextractor2,target=/home/chemdataextractor2 -it -p 8888:8888 --entrypoint bash obrink/chemdataextractor:2.1.2

If you are in windows system, open your command prompt and run

docker run --mount "type=bind,source=%cd%/chemdataextractor2,target=/home/chemdataextractor2" -it -p 8888:8888 --entrypoint bash obrink/chemdataextractor:2.1.2