WING-NUS / SciAssist

Other
19 stars 4 forks source link

Can not install sciassist #32

Open qolina opened 1 year ago

qolina commented 1 year ago

Commands used

conda create --name assist python=3.8
conda activate assist
pip install sciassist

Error message

DEPRECATION: pytorch-lightning 1.7.7 has a non-standard dependency specifier torch>=1.9.*. pip 23.3 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of pytorch-lightning or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063 Installing collected packages: chardet, six, PyYAML, pyparsing, pyasn1, packaging, oauthlib, multiprocess, jinja2, idna, click, certifi, attrs, async-timeout, sentry-sdk, pyasn1-modules, responses ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. nbconvert 7.8.0 requires traitlets>=5.1, which is not installed. Successfully installed PyYAML-6.0.1 async-timeout-4.0.3 attrs-23.1.0 certifi-2023.7.22 chardet-3.0.4 click-8.1.7 idna-2.8 jinja2-3.1.2 multiprocess-0.70.12.2 oauthlib-3.2.2 packaging-23.2 pyasn1-0.5.0 pyasn1-modules-0.3.0 pyparsing-3.1.1 responses-0.10.15 sentry-sdk-1.9.0 six-1.16.0

sciassist is not installed!

dyxohjl666 commented 1 year ago

I test on mce, ecp, and NSCC: mce: got the same warnings as your records, but it seems SciAssist works well. You can import SciAssist in python console. (The latest pytorch is incompatible to mce's gpu, so specify it to 1.12.0) ecp: no problem except "DEPRECATION" warnings from pytorch-lightning. When import SciAssist, there'are some "Import Error: No module named xx" . It seems that the default python version is 2.x and all of them come from Linxiao's from transformers import *. I'm not sure whether it's related to the server's setting, but python3 -m pip install SciAssist works well. NSCC: same with 2. Todo:

  1. Pytorch-lightning 1.7 still works well in our toolkit. I don't recommend to update it now because we are not sure the impact yet.

  2. I think there should be some problems with the server themselves, as many error files are in "/usr/share" ,and if one doesn't have root account, it's hard to discover the causes.

qolina commented 11 months ago

With Sciassist=0.1.1

The mce server:

~$ lsb_release -a

Distributor ID: Ubuntu Description: Ubuntu 20.04.6 LTS Release: 20.04

~$ nvidia-smi

NVIDIA-SMI 470.199.02 Driver Version: 470.199.02 CUDA Version: 11.4

Installation:

conda create --name assist python=3.8
conda activate assist
pip install sciassist

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. requests-oauthlib 1.3.1 requires oauthlib>=3.0.0, which is not installed. Successfully installed PyYAML-6.0.1 async-timeout-4.0.3 attrs-23.1.0 beautifulsoup4-4.9.3 certifi-2023.11.17 chardet-3.0.4 click-8.1.7 exceptiongroup-1.2.0 idna-2.8 iniconfig-2.0.0 jinja2-3.1.2 lightning-utilities-0.10.0 multiprocess-0.70.12.2 numpy-1.24.4 packaging-23.2 pluggy-1.3.0 protobuf-3.20.3 pyparsing-3.1.1 pytest-7.4.3 python-magic-0.4.27 pytorch-lightning-2.0.9.post0 requests-2.22.0 responses-0.18.0 safetensors-0.4.1 sciassist-0.1.1 sentry-sdk-1.9.0 six-1.16.0 tomli-2.0.1 transformers-4.30.2 urllib3-1.25.11

Reflection to (https://github.com/WING-NUS/SciAssist/issues/32#issuecomment-1765840118): 1) no torch installed here torch is installed together with pytorch lightning torch.version is '2.1.0+cu121', 2) pytorch lightning is a recent version 2.0.9, 3) the mentioned oauthlib is installed.

Try inference

from SciAssist import Summarization
summerizer = Summarization(device="gpu")
res = summerizer.predict(text, type="str")
print(res)

Failed to import transformers.models.llama.tokenization_llama_fast because of the following error (look up to see its traceback): tokenizers>=0.13.3 is required for a normal functioning of this module, but found tokenizers==0.12.1.

Change version

pip install pytorch-lightning==1.7.1
Inference again

RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False Wrong torch version, cannot recognize cuda.

conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.6 -c pytorch -c conda-forge Do 'pip install torch==1.12.0+cu113 torchvision==0.13.0+cu113 torchaudio==0.12.0 --extra-index-url https://download.pytorch.org/whl/cu113'

Correct Inference with reference string parsing and summarization!!

In summary, as you mentioned, torch should match the local machine (1.12.0 for our case), lightning=1.7.1 works for SciAssist.

I agree that we shall recommend users install their own torch before installing SciAssist.

Todo: Try different lightning versions based on the correct torch. Lightning: 1.8.0, 1.9.0, 2.0.0, 2.1.0 success in inference.

Try on MacOS, Windows system.

qolina commented 11 months ago

Installation on MacOS 14.1.1

Install miniconda from https://docs.conda.io/projects/miniconda/en/latest/

Install torch

'pip3 install torch torchvision torchaudio'

Successfully installed MarkupSafe-2.1.3 certifi-2023.11.17 charset-normalizer-3.3.2 filelock-3.13.1 fsspec-2023.10.0 idna-3.6 jinja2-3.1.2 mpmath-1.3.0 networkx-3.1 numpy-1.24.4 pillow-10.1.0 requests-2.31.0 sympy-1.12 torch-2.1.1 torchaudio-2.1.1 torchvision-0.16.1 typing-extensions-4.8.0 urllib3-2.1.0

Install SciAssist

Successfully installed GitPython-3.1.40 PyPDF2-2.10.9 PyYAML-6.0.1 aiohttp-3.9.1 aiosignal-1.3.1 antlr4-python3-runtime-4.9.3 async-timeout-4.0.3 attrs-23.1.0 beautifulsoup4-4.9.3 cffi-1.16.0 chardet-3.0.4 click-8.1.7 commonmark-0.9.1 cryptography-41.0.7 cycler-0.12.1 datasets-2.2.2 dill-0.3.4 docker-pycreds-0.4.0 exceptiongroup-1.2.0 fonttools-4.45.1 frozenlist-1.4.0 gitdb-4.0.11 huggingface-hub-0.19.4 hydra-core-1.3.2 idna-2.8 importlib-resources-6.1.1 iniconfig-2.0.0 joblib-1.3.2 kiwisolver-1.4.5 lightning-utilities-0.10.0 lxml-4.9.3 matplotlib-3.5.3 multidict-6.0.4 multiprocess-0.70.12.2 nltk-3.8.1 omegaconf-2.2.3 packaging-23.2 pandas-1.4.4 pathtools-0.1.2 pdfminer.six-20221105 pluggy-1.3.0 promise-2.3 protobuf-3.20.3 psutil-5.9.6 pyarrow-14.0.1 pycparser-2.21 pygments-2.17.2 pyparsing-3.1.1 pytest-7.4.3 python-dateutil-2.8.2 python-magic-0.4.27 pytorch-crf-0.7.2 pytorch-lightning-2.0.9.post0 pytz-2023.3.post1 regex-2023.10.3 requests-2.22.0 responses-0.18.0 rich-12.4.4 sacremoses-0.1.1 safetensors-0.4.1 sciassist-0.1.1 scikit-learn-1.3.2 scipy-1.10.1 seaborn-0.11.2 sentry-sdk-1.9.0 seqeval-1.2.2 setproctitle-1.3.3 shortuuid-1.0.11 six-1.16.0 smmap-5.0.1 soupsieve-2.5 threadpoolctl-3.2.0 tokenizers-0.13.3 tomli-2.0.1 torchcrf-1.1.0 torchmetrics-0.11.4 tqdm-4.66.1 transformers-4.30.2 urllib3-1.25.11 wandb-0.12.21 xxhash-3.4.1 yarl-1.9.3 zipp-3.17.0

Inference

Reference string parsing and summarization test passed!

Storage/Memory usage recoding for base models

Miniconda cache 1.5G Model checkpoints cache 2.7G Memory: 803MB for reference string parsing, 1.3G for summarization

JavonTeo commented 11 months ago

Installation on WSL Ubuntu 22.04.1 LTS

~$ lsb_release -a

No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 22.04.1 LTS Release: 22.04

nvidia-smi

NVIDIA-SMI 535.112 Driver Version: 537.42 CUDA Version: 12.2

Installation (Python 3.10.12):

python3 -m venv SciAssist
source SciAssist/bin/activate
pip install SciAssist

Successfully installed GitPython-3.1.40 MarkupSafe-2.1.3 PyPDF2-2.10.9 PyYAML-6.0.1 aiohttp-3.9.1 aiosignal-1.3.1 antlr4-python3-runtime-4.9.3 async-timeout-4.0.3 attrs-23.1.0 beautifulsoup4-4.9.3 certifi-2023.11.17 cffi-1.16.0 chardet-3.0.4 charset-normalizer-3.3.2 click-8.1.7 commonmark-0.9.1 cryptography-41.0.7 cycler-0.12.1 datasets-2.2.2 dill-0.3.4 docker-pycreds-0.4.0 exceptiongroup-1.2.0 filelock-3.13.1 fonttools-4.45.1 frozenlist-1.4.0 fsspec-2023.10.0 gitdb-4.0.11 huggingface-hub-0.19.4 hydra-core-1.3.2 idna-2.8 iniconfig-2.0.0 jinja2-3.1.2 joblib-1.3.2 kiwisolver-1.4.5 lightning-utilities-0.10.0 lxml-4.9.3 matplotlib-3.5.3 mpmath-1.3.0 multidict-6.0.4 multiprocess-0.70.12.2 networkx-3.2.1 nltk-3.8.1 numpy-1.26.2 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.18.1 nvidia-nvjitlink-cu12-12.3.101 nvidia-nvtx-cu12-12.1.105 omegaconf-2.2.3 packaging-23.2 pandas-1.4.4 pathtools-0.1.2 pdfminer.six-20221105 pillow-10.1.0 pluggy-1.3.0 promise-2.3 protobuf-3.20.3 psutil-5.9.6 pyarrow-14.0.1 pycparser-2.21 pygments-2.17.2 pyparsing-3.1.1 pytest-7.4.3 python-dateutil-2.8.2 python-magic-0.4.27 pytorch-crf-0.7.2 pytorch-lightning-2.0.9.post0 pytz-2023.3.post1 regex-2023.10.3 requests-2.22.0 responses-0.18.0 rich-12.4.4 sacremoses-0.1.1 safetensors-0.4.1 sciassist-0.1.1 scikit-learn-1.3.2 scipy-1.11.4 seaborn-0.11.2 sentry-sdk-1.9.0 seqeval-1.2.2 setproctitle-1.3.3 setuptools-69.0.2 shortuuid-1.0.11 six-1.16.0 smmap-5.0.1 soupsieve-2.5 sympy-1.12 threadpoolctl-3.2.0 tokenizers-0.13.3 tomli-2.0.1 torch-2.1.1 torchcrf-1.1.0 torchmetrics-0.11.4 tqdm-4.66.1 transformers-4.30.2 triton-2.1.0 typing-extensions-4.8.0 urllib3-1.25.11 wandb-0.12.21 xxhash-3.4.1 yarl-1.9.3

setup_grobid

BUILD SUCCESSFUL in 54s 30 actionable tasks: 25 executed, 5 up-to-date Grobid is installed.

run_grobid

environments/SciAssist/lib/python3.10/site-packages/transformers/generation_utils.py:24: FutureWarning: Importing GenerationMixin from src/transformers/generation_utils.py is deprecated and will be removed in Transformers v5. Import as from transformers import GenerationMixin instead. warnings.warn( Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers pip install xformers. Grobid is running now.

Try Summarization and Reference String Parsing test on pdf

from SciAssist import Summarization
pipeline = Summarization()
res = pipeline.predict('examples_H01-1042.pdf', type="pdf", num_beams=4, num_return_sequences=2)
print(res["summary"])
from SciAssist import ReferenceStringParsing
ref_parser = ReferenceStringParsing()
res = ref_parser.predict("examples_H01-1042.pdf", type="pdf")
print(res)

environments/SciAssist/lib/python3.10/site-packages/transformers/generation_utils.py:24: FutureWarning: Importing GenerationMixin from src/transformers/generation_utils.py is deprecated and will be removed in Transformers v5. Import as from transformers import GenerationMixin instead. warnings.warn( Xformers is not installed correctly. If you want to use memory_efficient_attention to accelerate training use the following command to install Xformers pip install xformers. Loading the model... ...

summarization and rsp test passed!

Testing summary

Even though I did not install torch or pytorch-lightning before installing SciAssist, it could still run properly. Hence I believe users can do pip install SciAssist straightaway. However, note that when testing, I ran into FutureWarning, telling me to pip install xformers.

I tried pip install xformers:

Successfully installed torch-2.1.0 xformers-0.0.22.post7

When running test, it is successful but same problem:

environments/SciAssist2/lib/python3.10/site-packages/transformers/generation_utils.py:24: FutureWarning: Importing GenerationMixin from src/transformers/generation_utils.py is deprecated and will be removed in Transformers v5. Import as from transformers import GenerationMixin instead. warnings.warn( Loading the model...

qolina commented 11 months ago

Thanks for testing different versions of Ubuntu system and CUDA, and test grobid which I forgot.

Description: Ubuntu 22.04.1 LTS NVIDIA-SMI 535.112 Driver Version: 537.42 CUDA Version: 12.2

Installation (Python 3.10.12):

setup_grobid

I notice your machine has advanced CUDA version 12.2, which matches with the default Pytorch installed by 'pip install sciassist'. It gives errors when you have an older version of CUDA and a non-compatible Pytorch. And yes, the version of pytorch-lightning is not the reason for errors. I also have these warning issues, which are ignored so far.

Testing summary

Even though I did not install torch or pytorch-lightning before installing SciAssist, it could still run properly. Hence I believe users can do pip install SciAssist straightaway. However, note that when testing, I ran into FutureWarning, telling me to pip install xformers.

I tried pip install xformers:

Successfully installed torch-2.1.0 xformers-0.0.22.post7

When running test, it is successful but same problem:

environments/SciAssist2/lib/python3.10/site-packages/transformers/generation_utils.py:24: FutureWarning: Importing GenerationMixin from src/transformers/generation_utils.py is deprecated and will be removed in Transformers v5. Import as from transformers import GenerationMixin instead. warnings.warn( Loading the model...

JavonTeo commented 11 months ago

Installation on Windows

nvidia-smi

NVIDIA-SMI 536.99 Driver Version: 536.99 CUDA Version: 12.2

Installation (Python 3.11.5)

Tried

python -m venv .env
.env\Scripts\activate
pip install SciAssist
python -m venv .env
.env\Scripts\activate
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
python -m venv .env
.env\Scripts\activate
pip3 install torch torchvision torchaudio

Got same error:

AppData\Local\Temp\pip-build-env-bcle4ruo\overlay\Lib\site-packages\setuptools\dist.py:674: SetuptoolsDeprecationWarning: The namespace_packages parameter is deprecated. !!

          ********************************************************************************
          Please replace its usage with implicit namespaces (PEP 420).

          See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages for details.
          ********************************************************************************

  !!
    ep.load()(self, ep.name, value)

  Edit mplsetup.cfg to change the build options; suppress output with --quiet.

  BUILDING MATPLOTLIB
        python: yes [3.11.5 | packaged by Anaconda, Inc. | (main, Sep 11 2023,
                    13:26:23) [MSC v.1916 64 bit (AMD64)]]
      platform: yes [win32]
         tests: no  [skipping due to configuration]
        macosx: no  [Mac OS-X only]

running build_ext Extracting /project/freetype/freetype2/2.6.1/freetype-2.6.1.tar.gz Building freetype in build\freetype-2.6.1 msbuild build\freetype-2.6.1\builds\windows\vc2010\freetype.sln /t:Clean;Build /p:Configuration=Release;Platform=x64 error: command 'msbuild' failed: None [end of output] note: This error originates from a subprocess, and is likely not a problem with pip. ERROR: Failed building wheel for matplotlib Failed to build matplotlib ERROR: Could not build wheels for matplotlib, which is required to install pyproject.toml-based projects

I upgraded pip and setuptools to pip 23.3.1 and setuptools 69.0.2 but still same error.