NASA-PDS / planetary-data-engine

Free-text search capability for planetary data, services, tools, and information
Apache License 2.0
0 stars 0 forks source link

Prepare an assessment to compare existing PDS keyword search to the Science Discovery Engine (SDE) public interface. #3

Closed jjacob7734 closed 1 year ago

jjacob7734 commented 1 year ago

💡 Description

The Science Discovery Engine (SDE) strives to provide cross-domain search capability for NASA's Science Mission Directory (SMD). This task is to compare the results provided by the SDE public interface (https://sciencediscoveryengine.nasa.gov/) for planetary data search by keyword with the existing PDS keyword search (https://pds.nasa.gov/datasearch/keyword-search/).

tloubrieu-jpl commented 1 year ago

@jjacob7734 is comparing results between sinequa and our current keyword search.

The SMD team will deploy a service with only the planetary related information are indexed. This will be done this week.

tloubrieu-jpl commented 1 year ago

@jjacob7734 has now access to a sandbox deployment of Sinequa that we will experiment with.

tloubrieu-jpl commented 1 year ago

@jjacob7734 knows how to select the indices used by the sinequa configuration. That will be used to test them independently and figure which one we need for the PDS web site search.

jjacob7734 commented 1 year ago

Assessment comparison is in a spreadsheet here: https://docs.google.com/spreadsheets/d/18vj4kFL0GNvqyb-mGhsxqqIu9C9LeNv5UB-UZp_fmJA/edit#gid=0

tloubrieu-jpl commented 1 year ago

The conclusion is that Sinequa service as provided initially by SMD does not work for PDS because it contains indices which are not relevant and pollute the search results.

The ticket https://github.com/NASA-PDS/planetary-data-engine/issues/4 follows up by restricting the list of used indices in a sandbox sinequa.

jjacob7734 commented 1 year ago

Updated the Assessment sheet to include a column for the SDE/Sinequa sandbox that is restricted to just PDS-related sources. The results are different than the public SDE interface. The new sandbox results are in Column C at https://docs.google.com/spreadsheets/d/18vj4kFL0GNvqyb-mGhsxqqIu9C9LeNv5UB-UZp_fmJA/edit#gid=0.

tloubrieu-jpl commented 1 year ago

This ticket is closed, conclusion is sinequa SMD configuration as-is is not good enough to be used by PDS web site. A new ticket will compare a sandbox configuration of sinequa with the legacy keyword search. see https://github.com/NASA-PDS/planetary-data-engine/issues/4