Separate evaluation code into own module

NatLibFi / FinGreyLit

Data set of Finnish grey literature, containing curated Dublin Core style metadata and links to original PDF publications

18 stars 2 forks source link

Separate evaluation code into own module #2

Closed juhoinkinen closed 11 months ago

juhoinkinen commented 11 months ago

This moves the code for evaluating extracted metadata records from the cell "Analyze the extracted metadata" of the Meteor-metadata-extraction notebook into own module that can be imported or run as a script.

Also adds tests for most (if not all) cases of comparisons of extracted and gold-standard records.

Developed for Meteor extraction, still needs to be checked how works with the GPT extraction pipeline.

juhoinkinen commented 11 months ago

I switched to return data in the schema of FinGreyLit in Meteor-metadata-extraction.ipynb notebook.

The results change somewhat for language, now the score means are:


language  field                
eng       dc.contributor.author    0.608856
          dc.date.issued           0.804428
          dc.identifier.isbn       0.797048
          dc.language.iso          0.981550
          dc.publisher             0.040590
          dc.relation.eissn        0.837638
          dc.title                 0.623616
fin       dc.contributor.author    0.578231
          dc.date.issued           0.812925
          dc.identifier.isbn       0.727891
          dc.language.iso          0.948980
          dc.publisher             0.112245
          dc.relation.eissn        0.816327
          dc.title                 0.506803
swe       dc.contributor.author    0.668790
          dc.date.issued           0.694268
          dc.identifier.isbn       0.904459
          dc.language.iso          0.898089
          dc.publisher             0.108280
          dc.relation.eissn        0.936306
          dc.title                 0.324841

juhoinkinen commented 11 months ago

Something can be wrong in dc.relation.eissn and dc.publisher fields as their score means change.

juhoinkinen commented 11 months ago

Something can be wrong in dc.relation.eissn and dc.publisher fields as their score means change.

Fixed. For eissn the relevant tests were not even run, and for publisher the output by Meteor was wrongly mapped (plain string instead of list) for which there was no tests.

Now the results of fields average values are identical between the code in the main branch and this branch in all fields except for language, which differ by (main vs this)

0.970480 vs 0.988930 for Englisth
0.952381 vs 0.945578 for Finnish
0.886792 vs 0.910828 for Swedish

I wonder could this difference come from nondeterminism of the language detection...?

~Also the tests are dumb, as there are conditions like if res["field"] == "dc.relation.eissn which happily skip the asserts if the specific field is not present.~ Fixed.

osma commented 11 months ago

I wonder could this difference come from nondeterminism of the language detection...?

Most likely yes. I wouldn't worry about this too much, it's close enough.