allenai / scispacy

A full spaCy pipeline and models for scientific/biomedical documents.
https://allenai.github.io/scispacy/
Apache License 2.0
1.66k stars 223 forks source link

SciSpacy Python 3.11 Installation Fails / Broken nmslib dependency #504

Open ksaadDE opened 5 months ago

ksaadDE commented 5 months ago

python3 -m pip install scispacy

bin/python (venv) --> Python 3.11.6

bin/pip --version (venv) --> 3.11.6

Building wheel for nmslib (pyproject.toml)
include -I/usr/include/python3.11 -c nmslib.cc -o build/temp.linux-x86_64-cpython-311/nmslib.o -O3 -march=native -fopenmp -DVERSION_INFO=\"2.1.1\" -std=c++14 -fvisibility=hidden
python3.11/site-packages/pybind11/include/pybind11/attr.h:310:20: Err: »const struct pybind11::detail::function_record
error: command '/usr/bin/gcc' failed with exit code 
Failed to build nmslib
ERROR: Could not build wheels for nmslib, which is required to install pyproject.toml-based projects
ksaadDE commented 5 months ago

Luckily enough, nobody needs to install the entire scispacy library to just obtain the Abbreviation Extraction utility :) https://github.com/allenai/scispacy/blob/main/scispacy/abbreviation.py

Just in case someone needs it as well. To include and use it:

from filename import AbbreviationDetector
loaded_nlp_model.add_pipe('abbreviation_detector')

Example code, partially stolen borrowed from StackOverflow

import spacy
from filename import AbbreviationDetector

def filter_abbrv (loaded_nlp_model, txtData):
        loaded_nlp_model.add_pipe('abbreviation_detector')
        doc=loaded_nlp_model (txtData)
        altered_tok=[tok.text for tok in doc]
        print("abbrv:", doc._.abbreviations)
        for abrv in doc._.abbreviations:
            altered_tok[abrv.start]=str(abrv._.long_form)
        return (" ".join(altered_tok))

loaded_nlp_model = spacy.load("en_core_web_lg") # or whatever
filter_abbrv (loaded_nlp_model, "StackOverflow (SO) and Github are pretty cool")

adding_abbreviation_detection_to_your_spacy_nlp_project.md

mp-lunartree-bio commented 2 months ago

Has anyone figured out a work-around for this for the functionalities which require scispacy or even nmslib?

dakinggg commented 1 month ago

Hi, you may have some luck with this workaround here: https://github.com/allenai/scispacy/issues/473#issuecomment-1590443024

ddofer commented 1 month ago

My workaround was to install everything in Python/anaconda 3.9. Annoying, but it works