HazyResearch / fonduer

A knowledge base construction engine for richly formatted data
https://fonduer.readthedocs.io/
MIT License
409 stars 77 forks source link

`get_horz_ngrams` works only when mention is_tabular #425

Closed HiromuHota closed 4 years ago

HiromuHota commented 4 years ago

Describe the bug

I have a span mention, mention, that is visual but is not tabular. I tried to get all horizontally aligned ngrams from the same sentence (ie get_horz_ngrams(mention, from_sentence=False)), but I got none because it is not tabular.

To Reproduce Steps to reproduce the behavior:

>>> mention = session.query(Mention).all()[0]
>>> print(mention.context.get_span())
Obama
>>> print(mention.context.sentence.text)
Obama was born in Honolulu, Hawaii.
>>> print(mention.context.sentence.is_visual())
True
>>> print(mention.context.sentence.is_tabular())
False
>>> from fonduer.utils.data_model_utils.visual import get_horz_ngrams
>>> print(list(get_horz_ngrams(mention, from_sentence=False)))
[]

Expected behavior

Since all the other 1-gram in the same sentence is horizontally aligned with the mention, I want to get all those.

>>> print(list(get_horz_ngrams(mention, from_sentence=False)))
["was", "born", "in", "Honolulu", ",", "Hawaii", "."]

Error Logs/Screenshots If applicable, add error logs or screenshots to help explain your problem.

Environment (please complete the following information):

Additional context Add any other context about the problem here.

In addition to the above issue, I feel from_sentence is a bit confusing.

HiromuHota commented 4 years ago

Oh, I just found a TODO comment https://github.com/HazyResearch/fonduer/blob/faae9c2cbc56ee4775729f0cc2730b7eec71b869/src/fonduer/utils/data_model_utils/visual.py#L230-L240