Open riverma opened 1 year ago
@jpl-jengelke recommends incorporating this into the current CI guide
+1'd by @ramesh-maddegoda, @drewm-jpl, @nttoole, @kgrimes2, @hookhua, @carlynlee
nasa-[project org]-[module name] [semantic version ID]
gov.nasa.[project org].[module name]
@nasa-[project org]/[module name]
terraform-nasa-[project org]-modules/[module-name]
[project org]-[project module]-test-dataset
nasa-[project org]-[project module]:[tag]
nasa-[project org]-[project module]:[tag]
Hi @riverma,
Regarding repositories for test data, it might be worth looking at the data repository guidance provided by Scientific Data - Nature (https://www.nature.com/sdata/policies/repositories).
In particular, their list of recommended generalist data repositories may be pertinent:
@riverma it looks like you have done a great job defining the repositories and formats that I would expect here. I'm mostly familiar with Maven central, and PyPI from building things in the past.
I think one thing to consider (which may be tangential to this ticket), is how/when do we push artifacts to these places? We have sort of thought about some notional methodologies related to this (see the blue part of this diagram).
My thoughts about test data are that: 1). We will be hopefully centralizing on a single representative "golden dataset" that exercises the capabilities we care to test. 2). As such, we should probably just store that dataset in S3, and be done with it. We aren't going to be storing gobs and gobs of data, but we just need that representative "starter" data. Any data produced as a result of SPS runs can be transitory, and deleted relatively quickly after verification. In other words, we aren't an actual mission, and won't had the Life Of Mission data requirements and associated costs. If we need to store several gigabytes of data on S3, it's not going to break the bank.
That being said, I haven't taken a look at the repositories @drewm-jpl mentioned. I do know that we are all familiar with AWS/S3 though..
Also, you might want to take a quick look at AWS CodeArtifact, but perhaps that wouldn't be the best solution to have work with a fully open-source building process. Or maybe it would work? Other public services like maven and pypi might be better, but just pointing out CodeArtifact, in case it wasn't looked at as part of this eval.
From @mike-gangl: see https://blog.pypi.org/posts/2023-04-23-introducing-pypi-organizations/
Checked for duplicates
Yes - I've already checked
Category
Software Lifecycle - the creation, change, and release of software
Describe the need
We have a need for recommendations for choice of packaging host (GH Packages, DockerHub, etc.), including automation architecture / solutions for inclusion of dependencies into builds. (+1'd by @mike-gangl). One of the things that would be great here is specific choice of, and details of how to interact with packaging hosts / managers.