HazyResearch / bootleg

Self-Supervision for Named Entity Disambiguation at the Tail
http://hazyresearch.stanford.edu/bootleg
Apache License 2.0
212 stars 27 forks source link

Annotations using entity_emb_file parameter are fast but not matching the accuracy level #110

Closed coolcoder001 closed 2 years ago

coolcoder001 commented 2 years ago

Hi, without using entity_emb_file parameter , the model is taking too long for 1000 word length text articles.

By using the entity_emb_file parameter , the inference time is getting reduced significantly , but the accuracy is getting hit big time. It is not able to identify the entities as well. I am using this code

annotator_object = BootlegAnnotator(model_name="bootleg_uncased",\
    device=-1,cache_dir="./cache",\
    entity_emb_file="./cache/entity_embeddings.npy",\
    extract_method="custom")

Here one point to mention , I am using custom NER (flair) , which is merged in the custom_module_extractor branch.

Is there something I am missing here ?

lorr1 commented 2 years ago

Hmm. You seem to have the right idea? What embeddings did you use? Did you extract them yourself or download them? Do you have a test sentence where you see the performance drop? I have a few ideas but am not sure.