AmyOlex / Chrono

Parsing time normalizations from text.
GNU General Public License v3.0
15 stars 4 forks source link

Getting a duplicate ID error in the Gold Standard for some reason #43

Closed AmyOlex closed 6 years ago

AmyOlex commented 6 years ago

When running the evaluation I get the following error: ValueError: duplicate id: 407@e@ID016_clinic_048@gold Need to check out this gold file to see what is the issue.

AmyOlex commented 6 years ago

Yep! The gold file has a duplicated entity:

    <entity>
            <id>407@e@ID016_clinic_048@gold</id>
            <span>5299,5304</span>
            <type>Frequency</type>
            <parentsType>Other</parentsType>
            <properties>
                    <Type>Other</Type>
                    <Every>409@e@ID016_clinic_048@gold</Every>
                    <Number>406@e@ID016_clinic_048@gold</Number>
                    <Modifier></Modifier>
            </properties>
    </entity>

To fix this I just deleted one of the entities in the gold file. That fixed that file, but I found others:

ValueError: duplicate id: 1397@e@ID035_clinic_103@gold

1397@e@ID035_clinic_103@gold 7379,7383 Period Duration Years

ValueError: duplicate id: 517@e@ID041_clinic_120@gold

517@e@ID041_clinic_120@gold 2919,2924 After Operator Link 201@e@ID041_clinic_120@gold 589@e@ID041_clinic_120@gold

ValueError: duplicate id: 654@e@ID049_clinic_142@gold

654@e@ID049_clinic_142@gold 6814,6819 This Operator DocTime 652@e@ID049_clinic_142@gold

Great! That fixed all the files! I'll post them to the shared data folder.