Open aazaff opened 3 years ago
Right here is fine. FYI, the "bigram model" has no bearing on the COSMOS results, only the embedding model used to identify similar terms. These similar terms are not used in retrieval at the present time.
Side note: Some of returns identified as "incorrect" here are actually correct returns (i.e., the returned object has the search term) but the returned object is in some way truncated. This is either because the table spans pages, something we do not handle currently or because of some other segmentation error. These should ideally be flagged as correct but incomplete. Example: https://xdd.wisc.edu/set_visualizer/sets/geothermal/object/07b481b52a9000ac82d1ff121bf549ec0df62a43
Likewise this one: this is flagged incorrect. But. if asked to classify what this visual object is and asked whether it contains "porosity" or not, the answer is Table and yes. https://xdd.wisc.edu/set_visualizer/sets/geothermal/object/935b76e310960c798bf0d94c1449bdbe0977b245
I agree. I told anna to be extra conservative in her determinations which was maybe too strict. I will see if she has time to go through them today and revise the numbers.
Revised version... now table of contents and truncation are flagged with "2" so this way you can decide the best way.
Here is an assessment of COSMOS returned results on the geothermal dataset (bigram model) for the search terms "thermal conductivity", "geochemistry", and "porosity" WITH the permalink to each success/failure included. Is there an ideal place to put this information?
table checks.xlsx