SapienzaNLP / extend

Entity Disambiguation as text extraction (ACL 2022)
Other
173 stars 12 forks source link

File not found Error: While adding extend to spacy nlp pipeline #3

Closed Vasistareddy closed 2 years ago

Vasistareddy commented 2 years ago

Using the same classy version mentioned in requirements.txt

Traceback (most recent call last): File "spacy_extend.py", line 13, in nlp.add_pipe("extend", after="ner", config=extend_config) File "/home/vasista/miniconda3/envs/extendtest/lib/python3.8/site-packages/spacy/language.py", line 792, in add_pipe pipe_component = self.create_pipe( File "/home/vasista/miniconda3/envs/extendtest/lib/python3.8/site-packages/spacy/language.py", line 674, in createpipe resolved = registry.resolve(cfg, validate=validate) File "/home/vasista/miniconda3/envs/extendtest/lib/python3.8/site-packages/thinc/config.py", line 746, in resolve resolved, = cls._make( File "/home/vasista/miniconda3/envs/extendtest/lib/python3.8/site-packages/thinc/config.py", line 795, in make
filled,
, resolved = cls._fill( File "/home/vasista/miniconda3/envs/extendtest/lib/python3.8/site-packages/thinc/config.py", line 867, in _fill
getter_result = getter(*args, **kwargs) File "/home/vasista/extend/extend/spacy_component.py", line 86, in init self.model = load_checkpoint(checkpoint_path, device) File "/home/vasista/extend/extend/spacy_component.py", line 22, in load_checkpoint model = load_classy_module_from_checkpoint(checkpoint_path) File "/home/vasista/miniconda3/envs/extendtest/lib/python3.8/site-packages/classy/utils/lightning.py", line 57, in load_classy_module_from_checkpoint conf = load_training_conf_from_checkpoint(checkpoint_path) File "/home/vasista/miniconda3/envs/extendtest/lib/python3.8/site-packages/classy/utils/lightning.py", line 23, in load_training_conf_from_checkpoint conf = OmegaConf.load(f"{experiment_folder}/.hydra/{conffile}") File "/home/vasista/miniconda3/envs/extendtest/lib/python3.8/site-packages/omegaconf/omegaconf.py", line 183, in load with io.open(os.path.abspath(file), "r", encoding="utf-8") as f: FileNotFoundError: [Errno 2] No such file or directory: '/home/vasista/extend/.hydra/config.yaml'

poccio commented 2 years ago

Hi, can you provide us with more details? In particular, can you add your spacy_extend.py script?

Vasistareddy commented 2 years ago
import spacy
from extend import spacy_component

nlp = spacy.load("en_core_web_sm")

extend_config = dict(
    checkpoint_path="/home/vasista/extend/experiments/extend-longformer-large/",
    mentions_inventory_path="/home/vasista/extend/data/inventories/le-and-titov-2018-inventory.min-count-2.sqlite3",
    device=0,
    tokens_per_batch=4000,
)

nlp.add_pipe("extend", after="ner", config=extend_config)

input_sentence = "Japan began the defence of their title " \
                 "with a lucky 2-1 win against Syria " \
                 "in a championship match on Friday."

doc = nlp(input_sentence)

# [(Japan, Japan National Footbal Team), (Syria, Syria National Footbal Team)]
disambiguated_entities = [(ent.text, ent._.disambiguated_entity) for ent in doc.ents]
print(disambiguated_entities)

@poccio that's the _spacyextend.py script. I was just trying to check the installation part with the sample code given in the documentation.

poccio commented 2 years ago

Hi @Vasistareddy, I'm sorry for this late reply.

The problem is with your _checkpointpath string. It should point directly to the ckpt file you want to use, within its classy folder. For instance, in our spacy_test script, _checkpointpath points to experiments/extend-longformer-large/2021-10-22/09-11-39/checkpoints/best.ckpt (rather than experiments/extend-longformer-large/ as in your case)

Vasistareddy commented 2 years ago

Thanks. That solved the issue. How much RAM does this project need?