UW-COSMOS / Cosmos

Knowledge base construction from raw scientific documents
37 stars 16 forks source link

Quick evaluation results #147

Open aazaff opened 3 years ago

aazaff commented 3 years ago

Here is an assessment of COSMOS returned results on the geothermal dataset (bigram model) for the search terms "thermal conductivity", "geochemistry", and "porosity" WITH the permalink to each success/failure included. Is there an ideal place to put this information?

table checks.xlsx

cambro commented 3 years ago

Right here is fine. FYI, the "bigram model" has no bearing on the COSMOS results, only the embedding model used to identify similar terms. These similar terms are not used in retrieval at the present time.

cambro commented 3 years ago

Side note: Some of returns identified as "incorrect" here are actually correct returns (i.e., the returned object has the search term) but the returned object is in some way truncated. This is either because the table spans pages, something we do not handle currently or because of some other segmentation error. These should ideally be flagged as correct but incomplete. Example: https://xdd.wisc.edu/set_visualizer/sets/geothermal/object/07b481b52a9000ac82d1ff121bf549ec0df62a43

cambro commented 3 years ago

Likewise this one: this is flagged incorrect. But. if asked to classify what this visual object is and asked whether it contains "porosity" or not, the answer is Table and yes. https://xdd.wisc.edu/set_visualizer/sets/geothermal/object/935b76e310960c798bf0d94c1449bdbe0977b245

aazaff commented 3 years ago

I agree. I told anna to be extra conservative in her determinations which was maybe too strict. I will see if she has time to go through them today and revise the numbers.

aazaff commented 3 years ago

contents_separated.xlsx

Revised version... now table of contents and truncation are flagged with "2" so this way you can decide the best way.