ICLRandD / Blackstone

:black_circle: A spaCy pipeline and model for NLP on unstructured legal text.
https://research.iclr.co.uk
Apache License 2.0
635 stars 101 forks source link

Compatibility with spaCy 2.1.9 & 2.2+ #24

Open phHartl opened 4 years ago

phHartl commented 4 years ago

Hi Blackstone team, at first, I want to thank you for your pre-trained models and your work in automatic legal text analysis. Especially your custom SentenceSegmenter and NER detections works very good with our dataset of legal texts. Unfortunately this package still depends on spaCy 2.1 or more specifically on spaCy 2.1.8. This version currently has a major memory leak bug (https://github.com/explosion/spaCy/issues/3618), which has been fixed with 2.1.9. I already modified the dependency files of Blackstone, so I'm able to install spaCy 2.1.9 instead of the required 2.1.8 which works flawlessly on my machine. You might consider changing your dependencies accordingly. However, it would be even better if you could update to an even newer version of spaCy (e.g. 2.2+) to profit from several performance optimizations done by Explosion. There is already a pending pull request (#22) to address this issue, but without the corresponding training data you used to train the model there is no way to retrain ourselves. It would be greatly appreciated if you could update your model & package to spaCy 2.2 - as this might take some time you update your package's dependencies to spaCy 2.1.9 in the meantime to circumvent memory leaks present in spaCy 2.1.9.

om-blip commented 2 years ago

How did you install spacy 2.1.9? When I try to install spacy 2.1.9 it gives me this error WARNING: Ignoring invalid distribution -ip (e:\react\geicochatbot-master\backend\venv\lib\site-packages) WARNING: Ignoring invalid distribution -ip (e:\react\geicochatbot-master\backend\venv\lib\site-packages) Collecting spacy==2.1.9 Using cached spacy-2.1.9.tar.gz (30.7 MB) Installing build dependencies ... error error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully. │ exit code: 1 ╰─> [39 lines of output] WARNING: Ignoring invalid distribution -ip (e:\react\geicochatbot-master\backend\venv\lib\site-packages) Collecting setuptools Using cached setuptools-65.0.0-py3-none-any.whl (1.2 MB) Collecting wheel<0.33.0,>0.32.0 Using cached wheel-0.32.3-py2.py3-none-any.whl (21 kB) Collecting Cython Using cached Cython-0.29.32-py2.py3-none-any.whl (986 kB) Collecting cymem<2.1.0,>=2.0.2 Using cached cymem-2.0.6-cp310-cp310-win_amd64.whl (36 kB) Collecting preshed<2.1.0,>=2.0.1 Using cached preshed-2.0.1.tar.gz (113 kB) Preparing metadata (setup.py): started Preparing metadata (setup.py): finished with status 'error' error: subprocess-exited-with-error

    python setup.py egg_info did not run successfully.
    exit code: 1

    [6 lines of output]
    Traceback (most recent call last):
      File "<string>", line 2, in <module>
      File "<pip-setuptools-caller>", line 34, in <module>
      File "C:\Users\OM\AppData\Local\Temp\pip-install-1axblb8v\preshed_554bc9bb4aa743d6b206c3b1263b5b66\setup.py", line 9, in <module>
        from distutils import ccompiler, msvccompiler
    ImportError: cannot import name 'msvccompiler' from 'distutils' (E:\react\GEICOChatBot-master\backend\venv\lib\site-packages\setuptools\_distutils\__init__.py)
    [end of output]

    note: This error originates from a subprocess, and is likely not a problem with pip.
  error: metadata-generation-failed

  Encountered error while generating package metadata.

  See above for output.

  note: This is an issue with the package mentioned above, not pip.
  hint: See above for details.
  WARNING: Ignoring invalid distribution -ip (e:\react\geicochatbot-master\backend\venv\lib\site-packages)
  WARNING: Ignoring invalid distribution -ip (e:\react\geicochatbot-master\backend\venv\lib\site-packages)
  WARNING: Ignoring invalid distribution -ip (e:\react\geicochatbot-master\backend\venv\lib\site-packages)
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error

× pip subprocess to install build dependencies did not run successfully. │ exit code: 1 ╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip. WARNING: Ignoring invalid distribution -ip (e:\react\geicochatbot-master\backend\venv\lib\site-packages) WARNING: Ignoring invalid distribution -ip (e:\react\geicochatbot-master\backend\venv\lib\site-packages) WARNING: Ignoring invalid distribution -ip (e:\react\geicochatbot-master\backend\venv\lib\site-packages)

frankie336 commented 1 year ago

Step by step guide to fix

Fork a copy of the Blackstone repository on Github.

  1. Clone your forked repository onto your local machine using the command git clone , where is the URL of your forked repository on Github.
  2. Navigate to the root directory of your local Blackstone repository on your machine.
  3. Update the dependencies for Blackstone in the setup.py file by changing the required version of spaCy to 2.1.9 or higher.
  4. Save the changes to the setup.py file.
  5. Create a new conda environment using the command conda create --name blackstone python=3.10 in your terminal or Anaconda Prompt.
  6. Activate the new conda environment using the command conda activate blackstone.
  7. Install your local instance of Blackstone into the conda environment using the command pip install -e . while still in the root directory of your local Blackstone repository.
  8. Verify that the installation was successful by importing Blackstone in a Python script or notebook using the command import blackstone.
  9. Test the package to ensure that it is working correctly. I hope that helps! Let me know if you have any further questions.