COSMOS is a web application designed to manage collections indexed in NASA's Science Discovery Engine (SDE), facilitating precise content selection and allowing metadata modification before indexing.
The existing url import code that brings urls into cosmos cannot support our downstream ML tasks. To do that, we need full texts. Afaik, full texts cannot be retrieved via the query endpoint, only the sql endpoint.
However, this card was too broad, and we are breaking it into smaller chunks.
Implementation Considerations
use the engine.sql endpoint to get all existing metadata from the dev servers
use the engine.sql endpoint to get full_texts from the dev servers
store the incoming full_text in a new CandidateURL field called full_text
how will we do error handling?
Tests: In order to really test the important bits, we would need to emulate a sinequa server, which we are not going to do. Therefore, it is probably not worth it to make any tests right now.
We should be using tokens, similar to config_generation/minimum_api.py. The actual code will referecnce an environment variable. The token will be put into this file on local, and onto the server when we deploy. Sorry, it goes in .django local file.
Open Questions
Credentials: for local development, we will use Li's server
once it goes into staging, it should use existing environment variable?
Description
The existing url import code that brings urls into cosmos cannot support our downstream ML tasks. To do that, we need full texts. Afaik, full texts cannot be retrieved via the query endpoint, only the sql endpoint.
Work was started previously on:
However, this card was too broad, and we are breaking it into smaller chunks.
Implementation Considerations
full_text
config_generation/minimum_api.py
. The actual code will referecnce an environment variable. The token will be put into this file on local, and onto the server when we deploy. Sorry, it goes in.django
local file.Open Questions
Deliverable