Develop Query Test Suite and Success Criteria

jordanpadams commented 10 months ago

💡 Description

tloubrieu-jpl commented 9 months ago

@jjacob7734 need to describe more what we expect from this theme, and create sub-tickets or one analysis sub-ticket to start with.

1) We need to extract the relevant user stories to define the test suite. 2) Then for these tests, compare the expected results with the actual results from Sinequa by customizing the way each fields are search for, and some weight (boost) on each of them.

jjacob7734 commented 9 months ago

User Stories: https://airtable.com/applgnSo7ROCVIjbd/shrvc3CFrOgqsfPlL/tblapwOzIaDVB1x3d My summary/taxonomy of user stories: https://docs.google.com/spreadsheets/d/1qk-setU_rJ5zLv-jlrPv7ci5Wdxp2UYwt_RPgXaiS8U/edit#gid=0 (also see the Search Function column in the AirTable above)

jordanpadams commented 8 months ago

Status: @jjacob7734 looking over story categorization made early on in task. to discuss at breakout

jjacob7734 commented 8 months ago

Notes from @jjacob7734 + @jordanpadams breakout discussion on 10/5/23:

Faceted search refers to getting all of the unique values for a particular facet. Can we do that in Sinequa?
Pick one of the basic keyword search user stories like #126 "Mars magnetic field" and determine the top results we want to see (relevant datasets and landing page).
Consider scoring the actual search results to quantify how close we are to what we want.

jjacob7734 commented 8 months ago

The following are example queries from @jordanpadams where SDE/Sinequa does not produce the best results. In discussion with SDE, some of these can be improved by fixing data curation problems.

Data pages are not on top:

Suboptimal duplicate pages are referenced:

In https://sciencediscoveryengine.nasa.gov/app/nasa-sba-smd/#/search?query=%7B%22name%22:%22query-smd-primary%22,%22text%22:%22cassini%22,%22tab%22:%22all%22,%22select%22:%5B%5B%22treepath:%20(%60Planetary%20Image%20Galleries%60:%60%2FPlanetary%20Science%2FData%2FPlanetary%20Image%20Galleries%2F*%60)%22,%22Treepath%22%5D%5D%7D, Planetary Image Galleries is a subset / copy of the data that is here: https://photojournal.jpl.nasa.gov/

jordanpadams commented 8 months ago

Status: Investigating relevance boosting and how that can be brought into the test suite and success criteria.

tloubrieu-jpl commented 8 months ago

Create 2 sub-taks:

create the test suite
evaluate the results with metrics

jordanpadams commented 7 months ago

📆 October Status: Test suite software in work. On schedule

jordanpadams commented 6 months ago

📆 November status: Test suite software in work. On schedule

jordanpadams commented 5 months ago

📆 December status: In work. Completion delayed 1 sprint. No impact on delivery.

jordanpadams commented 4 months ago

Call this done. Also have sinequa documentation here: https://github.com/NASA-PDS/planetary-data-engine/wiki/SDE%E2%80%90Sinequa

NASA-PDS / planetary-data-engine

Develop Query Test Suite and Success Criteria #7

💡 Description