allenai / scispacy

A full spaCy pipeline and models for scientific/biomedical documents.
https://allenai.github.io/scispacy/
Apache License 2.0
1.72k stars 229 forks source link

SciSpacy Python 3.11 Installation Fails / Broken nmslib dependency #504

Closed ksaadDE closed 2 months ago

ksaadDE commented 9 months ago

python3 -m pip install scispacy

bin/python (venv) --> Python 3.11.6

bin/pip --version (venv) --> 3.11.6

Building wheel for nmslib (pyproject.toml)
include -I/usr/include/python3.11 -c nmslib.cc -o build/temp.linux-x86_64-cpython-311/nmslib.o -O3 -march=native -fopenmp -DVERSION_INFO=\"2.1.1\" -std=c++14 -fvisibility=hidden
python3.11/site-packages/pybind11/include/pybind11/attr.h:310:20: Err: »const struct pybind11::detail::function_record
error: command '/usr/bin/gcc' failed with exit code 
Failed to build nmslib
ERROR: Could not build wheels for nmslib, which is required to install pyproject.toml-based projects
ksaadDE commented 9 months ago

Luckily enough, nobody needs to install the entire scispacy library to just obtain the Abbreviation Extraction utility :) https://github.com/allenai/scispacy/blob/main/scispacy/abbreviation.py

Just in case someone needs it as well. To include and use it:

from filename import AbbreviationDetector
loaded_nlp_model.add_pipe('abbreviation_detector')

Example code, partially stolen borrowed from StackOverflow

import spacy
from filename import AbbreviationDetector

def filter_abbrv (loaded_nlp_model, txtData):
        loaded_nlp_model.add_pipe('abbreviation_detector')
        doc=loaded_nlp_model (txtData)
        altered_tok=[tok.text for tok in doc]
        print("abbrv:", doc._.abbreviations)
        for abrv in doc._.abbreviations:
            altered_tok[abrv.start]=str(abrv._.long_form)
        return (" ".join(altered_tok))

loaded_nlp_model = spacy.load("en_core_web_lg") # or whatever
filter_abbrv (loaded_nlp_model, "StackOverflow (SO) and Github are pretty cool")

adding_abbreviation_detection_to_your_spacy_nlp_project.md

mp-lunartree-bio commented 6 months ago

Has anyone figured out a work-around for this for the functionalities which require scispacy or even nmslib?

dakinggg commented 5 months ago

Hi, you may have some luck with this workaround here: https://github.com/allenai/scispacy/issues/473#issuecomment-1590443024

ddofer commented 5 months ago

My workaround was to install everything in Python/anaconda 3.9. Annoying, but it works

ulc0 commented 3 months ago

@dakinggg Do you have a workaround for Databricks ML? I've run out of tricks, I cannot get nmslib to install on 3.11 or 3.10

dakinggg commented 3 months ago

Based on https://github.com/allenai/scispacy/issues/520#issue-2438749767, I was able to get it working on both windows and wsl with python 3.11, by installing with mamba. Could others on this thread try that and let me know if it works? If so, I will update the installation instructions.

dakinggg commented 2 months ago

I've added a known support matrix to the readme for nmslib, so going to go ahead and close this issue.