NASA-IMPACT / COSMOS

COSMOS is a web application designed to manage collections indexed in NASA's Science Discovery Engine (SDE), facilitating precise content selection and allowing metadata modification before indexing.
https://sde-indexing-helper.nasa-impact.net/
3 stars 1 forks source link

Retrieve Full-Texts from Sinequa Dev Servers #1071

Closed CarsonDavis closed 2 weeks ago

CarsonDavis commented 1 month ago

Description

The existing url import code that brings urls into cosmos cannot support our downstream ML tasks. To do that, we need full texts. Afaik, full texts cannot be retrieved via the query endpoint, only the sql endpoint.

Work was started previously on:

However, this card was too broad, and we are breaking it into smaller chunks.

Implementation Considerations

Open Questions

Deliverable

### Tasks
- [ ] https://github.com/NASA-IMPACT/COSMOS/issues/1075
saifrk commented 1 month ago

Please refer to the attached mind map for a visual representation of the changes incorporated in this task. 1071