Marlon668 / VerificationClaimsWithTimeAttribution

3 stars 0 forks source link

Problem causes by invalid date #1

Open seelenbrecher opened 1 year ago

seelenbrecher commented 1 year ago

Hi,

I encountered some problems while trying to run the code. The problem is that sometimes the date value produced by Heideltime (or maybe the date in the source text itself) is not a correct date, hence it throws error when running scripts on divison1And2/verificationModelBERTGlobal.py

to reproduce:

  1. This is the pre-computed PreprocesedTimes/afck-00297/2.xml
    <TimeML>
    The evidence with the title 'Results from the <TIMEX3 tid="t1" type="DATE" value="2013">2013</TIMEX3> National Survey on DrugUse and Health ' says This report was prepared by the Center for Behavioral Health Statistics and  . Education .     <TIMEX3 tid="t4" type="DATE" value="2012">Past Year</TIMEX3> Initiates of Marijuana and Any Illicit Drug among  Persons Aged 12 or  .<TIMEX3 tid="t2" type="DATE" value="2015-02-29">29.2.15</TIMEX3> Daily or Almost Daily Marijuana Use in <TIMEX3 tid="t5" type="DATE" value="2014">the Past  Year</TIMEX3> and Past   Alcohol Use among Adults Aged 18 to 22, by College  Enrollment: <TIMEX3 tid="t3" type="DATE" value="2002">2002</TIMEX3>-.
    </TimeML>
  2. run divison1And2/verificationModelBERTGlobal.py

Error occured when reading the dataset,

Traceback (most recent call last):
  File "verificationModelBERTGlobal.py", line 426, in <module>
    test_set = NUS(mode='Test', path='test/test-' + domain + '.tsv', domain=domain)
  File "VerificationClaimsWithTimeAttribution/dataset.py", line 38, in __init__
    self.getClaims(mode,path)
  File "VerificationClaimsWithTimeAttribution/dataset.py", line 1216, in getClaims
    datasM, durenM, refsM, setsM = snippet.readTime()
  File "MULTIFC/VerificationClaimsWithTimeAttribution/baseModel/Snippet.py", line 459, in readTime
    '%Y-%m-%d'),root[depth].text.strip()]
  File "anaconda3/envs/implicit_fc/lib/python3.7/_strptime.py", line 577, in _strptime_datetime
    tt, fraction, gmtoff_fraction = _strptime(data_string, format)
  File "anaconda3/envs/implicit_fc/lib/python3.7/_strptime.py", line 544, in _strptime
    datetime_date(year, 1, 1).toordinal() + 1
ValueError: day is out of range for month

Problem = 2015-02-29 is not a valid date (no 29 February on 2015)

This is one of some cases that I encounter because of invalid date.

How do you handle such cases so we can run the program properly?

Thank you

Marlon668 commented 1 year ago

Hello

I'm aware that there are some faulty dates written in the article texts. I've solved that by manually shifting those dates one day earlier such that it becomes a possible date, e.g. 29-02-2015 becomes 28-02-2015 or 31-03-2019 becomes 30-03-2019. I will make a commit that does this shift automatically in the code (will be pushed in approximately 2 weeks due to vacation).

Best regards Marlon