NASA-IMPACT / COSMOS

COSMOS is a web application designed to manage collections indexed in NASA's Science Discovery Engine (SDE), facilitating precise content selection and allowing metadata modification before indexing.
https://sde-indexing-helper.nasa-impact.net/
2 stars 1 forks source link

In addition to URLs, also bring in full-text from Sinequa servers #1016

Open code-geek opened 2 weeks ago

code-geek commented 2 weeks ago

Description

When we bring in URLs right now, we just get the URL and title. We also want to store the full-text in the database (but not show it in the table necessarily). This will allow us to track when fulltext for pages change, and also use this data for LLM purposes.

Implementation Considerations

Deliverable

Dependencies

No response

CarsonDavis commented 23 hours ago

We will probably need to use the sql endpoint from sinequa. Documentation can be found at this link: https://doc.sinequa.com/en.sinequa-es.v11/Content/en.sinequa-es.devDoc.webservice.rest-search.html#engine-sql

CarsonDavis commented 23 hours ago

Existing code using this endpoint can be found in the following files:

CarsonDavis commented 23 hours ago

You may need a token from the server that includes SQL engine access. You can search for tokens in the admin console, and then you make a new token, and you give it permissions.