NASA-PDS / registry

PDS Registry provides service and software application necessary for tracking, searching, auditing, locating, and maintaining artifacts within the system. These artifacts can range from data files and label files, schemas, dictionary definitions for objects and elements, services, etc.
https://nasa-pds.github.io/registry
Apache License 2.0
3 stars 2 forks source link

As a node operator, I want to upload to Registry without downloading data from s3 #349

Open rgdeen opened 1 week ago

rgdeen commented 1 week ago

Checked for duplicates

Yes - I've already checked

🧑‍🔬 User Persona(s)

node operator - those putting data in the Registry

💪 Motivation

For our high-volume missions, data comes to us (IMG) from the data provider in s3. We never have a complete copy on disk anywhere. Validations are done piecewise on a KDP cluster. We do an s3-to-s3 transfer to the public bucket, where it needs to be registered.

Currently, all the data must be downloaded somewhere, which is problematic for 10TB deliveries.

Downloading the labels is tractable, but the data isn't. The data in s3 has (or can have) an rclone-style md5 checksum which can be retrieved, which should obviate the need for the data itself.

Yes we can download piecewise, but that's just that many more steps that could go wrong and risks missing things (for example I would never trust the KDP piecewise processing as it has been proven to be unreliable).

📖 Additional Details

No response

Acceptance Criteria

Given When I perform Then I expect

⚙️ Engineering Details

No response

🎉 I&T

No response