Closed maffeyl closed 6 years ago
Stripping periods in hasYear does solve this particular issue, but I'm not sure if it will reduce our ability to identify years, I can't think of how it would, but more rigorous thought is warranted.
Also got this error ValueError: invalid literal for int() with base 10: '2012**note' on doc0178_CLIN
Also got this error for file ID004_clinic_012: ValueError: invalid literal for int() with base 10: '2010"' And again for file ID004_path_011, ID181_clinic_529...actually, all of them have this error! This is the doc time in the metadata line. This needs to be fixed in our code....working on that now.
By utilizing the .group(0) in the hasYear() method and returning the matched string instead of the original string I think I fixed this error.
From the THYME train set, doc0056_CLIN has a "2006." that isn't converted to int because it has a period at the end. Other punctuation is removed on line 2489 of TimePhrase_to_Chrono in hasYear before it's passed back to create the year entity. Going to try removing the period in my own branch and see how it goes.