HazyResearch / fonduer

A knowledge base construction engine for richly formatted data
https://fonduer.readthedocs.io/
MIT License
407 stars 77 forks source link

get_col_ngrams and get_cell_ngrams return inconsistent result when a mention is not tabular #471

Closed HiromuHota closed 4 years ago

HiromuHota commented 4 years ago

Description of the bug

get_col_ngrams and get_cell_ngrams from fonduer.utils.data_model_utils.tabular return an inconsistent result when a mention is not tabular

Given mention.get_span()=="Sample" and mention.is_tabular() == False like below, get_col_ngrams(mention) returns [None] while get_cell_ngrams(mention) returns ["markdown"]

image

To Reproduce

See #470

Expected behavior

There could be four approaches:

  1. Return [""]
  2. Return [] (like get_col_ngrams)
  3. Return ["markdown"] (like get_cell_ngrams)
  4. Raise a ValueError

Error Logs/Screenshots

Not a bug, but inconsistent return values among tabular util functions.

Environment (please complete the following information)

Additional context

N/A