egerber / spaCy-entity-linker

spaCy module for linking text to Wikidata items
MIT License
215 stars 32 forks source link

HTML code in output <EntityElement: #28

Open AkimfromParis opened 1 year ago

AkimfromParis commented 1 year ago

Hello,

Thank you for this great alternative. I am currently starting a new project to create a domain-specific knowledge base for NER. I have tested all the methods in EntityElement. It's working perfectly. Only one strange thing when I run...

for sent in doc.sents: sent._.linkedEntities.pretty_print()

My output on VS Code and Jupyter comes with HTML code:

<EntityElement: https://www.wikidata.org/wiki/Q194318 Pirates of the Caribbean Series of fantasy adventure films > <EntityElement: https://www.wikidata.org/wiki/Q12525597 Silvester the day celebrated on 31 December (Roman Catholic Church) or 2 January (Eastern Orthodox Churches)>

Any advice?

Best,

MartinoMensio commented 1 year ago

Hi @AkimParis ,

That's the expected behaviour of pretty_print. You probably want to visualise some attributes of the entities:

# iterate per sentence and see the entities
for sent in doc.sents:
    for e in sent._.linkedEntities:
        print(f'ID: {e.get_id()}, LABEL: {e.get_label()}, SPAN: {e.get_span()}, URL: {e.get_url()}, DESCRIPTION: {e.get_description()}')

# or all entities in the doc
for e in doc._.linkedEntities:
    print(f'ID: {e.get_id()}, LABEL: {e.get_label()}, SPAN: {e.get_span()}, URL: {e.get_url()}, DESCRIPTION: {e.get_description()}')

# or to build a table see: https://stackoverflow.com/questions/35160256/how-do-i-output-lists-as-a-table-in-jupyter-notebook

# or get them visualised in the displaCy visualiser (most beautiful solution)
from spacy import displacy
from spacy.tokens import Span
# convert to spans annotated by super entities (usually the type of entity)
# Spans are created with Span(doc, start, end, label)
spans = [Span(doc, e.get_span().start, e.get_span().end, e.get_super_entities()[0].label) for e in doc._.linkedEntities]
# if you want to inspect the labels
super_entities_all = [s.label_ for s in spans]
# now visualise them ("sc" is the default span group that is visualised by displacy https://spacy.io/usage/visualizers#span )
doc.spans["sc"] = spans
displacy.render(doc, style="span", jupyter=True)
image
AkimfromParis commented 1 year ago

Thank you for your answer, Martino!

Ok, I was just wondering if I did something wrong. All good then! : ) Best,