AmyOlex / Chrono

Parsing time normalizations from text.
GNU General Public License v3.0
15 stars 4 forks source link

Invalid start byte for i2b2 files #104

Open AmyOlex opened 4 years ago

AmyOlex commented 4 years ago

Another error from doc 16, invalid start byte???

NOW PARSING PHRASE: 01/30/96 12:18

TOFIX: PeriodInterval.py @ line 304: convert to using the dictionary. TOFIX: PeriodInterval.py @ line 388: convert to using the dictionary. XXXXXXXXX 20 01/30/96 12:18 <2412,2426> Type: None Mod: None Value: None DocTime: 1996-01-04 00:00:00 26entity Minute-Of-Hour 25entity Hour-Of-Day 24entity Day-Of-Month 23entity Month-Of-Year 22entity Two-Digit-Year Converting phrase to ISO: 20 01/30/96 12:18 <2412,2426> Type: None Mod: None Value: None DocTime: 1996-01-04 00:00:00 ENTITY: 26entity Minute-Of-Hour ENTITY: 25entity Hour-Of-Day ENTITY: 24entity Day-Of-Month ENTITY: 23entity Month-Of-Year ENTITY: 22entity Two-Digit-Year MY ISO:::: 1996-01-30T12:18:00 ISO Value: 20 01/30/96 12:18 <2412,2426> Type: None Mod: None Value: 1996-01-30T12:18:00 DocTime: 1996-01-04 00:00:00 TIMEX3 String: Number of Chrono Entities: 25 Parsing /Users/alolex/Desktop/CCTR_Git_Repos/Chrono/i2b2_train/.DS_Store ... Traceback (most recent call last): File "Chrono.py", line 160, in doctime = utils.getDocTime(infiles[f], i2b2=True) File "/Users/alolex/Desktop/CCTR_Git_Repos/Chrono/Chrono/utils.py", line 133, in getDocTime lines = file.readlines() File "/Users/alolex/anaconda3/lib/python3.7/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte

Error is file 321: NOW PARSING PHRASE: 01/30/96 12:18

TOFIX: PeriodInterval.py @ line 304: convert to using the dictionary. TOFIX: PeriodInterval.py @ line 388: convert to using the dictionary. XXXXXXXXX 20 01/30/96 12:18 <2412,2426> Type: None Mod: None Value: None DocTime: 1996-01-04 00:00:00 26entity Minute-Of-Hour 25entity Hour-Of-Day 24entity Day-Of-Month 23entity Month-Of-Year 22entity Two-Digit-Year Converting phrase to ISO: 20 01/30/96 12:18 <2412,2426> Type: None Mod: None Value: None DocTime: 1996-01-04 00:00:00 ENTITY: 26entity Minute-Of-Hour ENTITY: 25entity Hour-Of-Day ENTITY: 24entity Day-Of-Month ENTITY: 23entity Month-Of-Year ENTITY: 22entity Two-Digit-Year MY ISO:::: 1996-01-30T12:18:00 ISO Value: 20 01/30/96 12:18 <2412,2426> Type: None Mod: None Value: 1996-01-30T12:18:00 DocTime: 1996-01-04 00:00:00 TIMEX3 String: Number of Chrono Entities: 25 Parsing /Users/alolex/Desktop/CCTR_Git_Repos/Chrono/i2b2_train/.DS_Store ... Traceback (most recent call last): File "Chrono.py", line 161, in doctime = utils.getDocTime(infiles[f], i2b2=True) File "/Users/alolex/Desktop/CCTR_Git_Repos/Chrono/Chrono/utils.py", line 133, in getDocTime lines = file.readlines() File "/Users/alolex/anaconda3/lib/python3.7/codecs.py", line 322, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 3131: invalid start byte