LexPredict / lexpredict-lexnlp

LexNLP by LexPredict
GNU Affero General Public License v3.0
690 stars 175 forks source link

Not able to Extract "multiple" dates using get_date #39

Open suyashdb opened 4 years ago

suyashdb commented 4 years ago
>>>import lexnlp.extract.en.dates
>>> text = "This agreement is dated on 15th july 2018. This agreement shall terminate on the 15th day of March, 2020. "
>>> print(list(lexnlp.extract.en.dates.get_dates(text)))
[datetime.date(2020, 3, 15)]

currently the get_dates, get_raw_date_list method giving me only the last occurrence of date entity. In above text, i expected 15th july 2018 along with 15th march 2020.

Is there a way to grab all dates from a text/sentence?

Edit: Probably the issue is: the first date in my text was not recognized hence not extracted. Here is the example:

>>> text = "AUTO XX IF SSR TKNA/E OR FA NOT RCVD BY RJ BY 29MAY19 1350 DOH LT,REF IATA PRVD PAX"
>>> list(lexnlp.extract.en.dates.get_raw_dates(text))
[]
afparsons commented 4 years ago

Hello @suyashdb,

Thank you for filing this issue.

  1. Could you please tell us which version of LexNLP you are using? You can, for example, run lexnlp.__version__ in a REPL to quickly discern the version.

  2. I cannot replicate your first example:

In[2]: from lexnlp.extract.en.dates import get_dates_list
In[3]: text = "This agreement is dated on 15th july 2018. This agreement shall terminate on the 15th day of March, 2020. "
In[4]: get_dates_list(text)

Out[4]: [datetime.date(2018, 7, 15), datetime.date(2020, 3, 15)]
  1. It seems like one of the checks LexNLP performs to rule out false positives is preventing date extraction from occuring in your second example. Is 1350 a timestamp (13:50), part of an address, or intended to be some other integer? Could you tell me what domain (agreement, financial document, etc.) your second example is from, and how often such a construction (DD<month, spelt out>YY <integer>) occurs? We are concious about the possibility of introducing regressions when making changes to the LexNLP extraction functions to handle such cases, and would like to know how frequently such constructions occur.