NASA-IMPACT / veda-docs

Documentation for the VEDA Project
https://nasa-impact.github.io/veda-docs
Apache License 2.0
7 stars 7 forks source link

Enable end user to ingest catalog on their own? #94

Closed kathrynberger closed 1 year ago

kathrynberger commented 1 year ago

The docs walk the end user through the catalog ingestion process, but a note seen here in Catalog Ingestion states: Note: The steps after this are technical, so at this point the scientists can send the json to the VEDA team and they'll handle the publication process.

Do you still want this to be the case (i.e., VEDA team as an intermediary)? I am assuming the ultimate end goal is to allow those technically able end users to do the process on their own, and to enable others (not yet ready) to read the docs and feel comfortable learning through the process, with the VEDA team there for support - but want to clarify the ultimate end goal.

jsignell commented 1 year ago

I can't really imagine a world where there is no gatekeeping on VEDA. My goal is that the boundary keeps getting pushed so that the scientist is responsible for more and more of the process to the point where ultimately the scientist creates a PR and someone from the VEDA team just has to review and merge the PR and that kicks off the whole pipeline.

So there will always be a hand-off at some point I think. But we should definitely tighten up that "sent the json to the VEDA team" language. I think we specifically mean "open a PR on veda-data"

kathrynberger commented 1 year ago

Ok, that's what I was trying to understand between the lines. This makes sense. đź‘Ť I can definitely work to tighten up the language there. Thanks for the clarification!

jsignell commented 1 year ago

I'll just copy Jonas's answer from slack to keep this all in one place but I think he and I are saying the same thing (he's just saying it more elegantly):

Ultimately, we will want to have an accessible process for a data contributor - someone who knows the content of the data, can judge its quality, and is the one to provide all the contextual metadata - to get data and metadata ready for automated ingestion on their own. And then an automated process would pick up from there, present the data for review to the right people, and complete the ingestion once it has been approved.

Currently, we do not have the process automated and enough guardrails in place for contributors to get everything ready, so it is a more hand-held process. Also, the final ingestion into the catalog (after data preparation) still needs to be completed by hand (by an engineer on the project).

Sooo… The steps you have been completing - working with the STAC API etc - should ultimately not be completed by an “end user” (a data contributor), so perhaps you can give that a bit less attention in terms of documentation, compared to the preparation.

Going through the process as it currently is and reporting on the challenges is basically to give the team a good baseline and evidence of where the biggest obstacles are.