NOAA-PSL / observation-archive

Tools to assemble observational archive record
Apache License 2.0
0 stars 0 forks source link

updates to preprbufr and amsua yaml files and changelogs #13

Closed frolovsa closed 1 year ago

frolovsa commented 2 years ago

This is not perfect but this PR updates yaml files to address (but not fully close) #11 and #12 .

Also records the sequence of actions happening for prepbufr and 1bamsua copy in the corresponding changelog files.

frolovsa commented 2 years ago

@HenryWinterbottom-NOAA @jswhit I would like to get your feedback on my not-full-proof system of keeping track of file prominence in our clean bucket.

under the data-tree/reanalysis/STREAM/VERSION i will be version controlling files like

IN the future we should do a better job of keep track of the files histories. but i think that might work for now.

HenryRWinterbottom commented 2 years ago

@HenryWinterbottom-NOAA @jswhit I would like to get your feedback on my not-full-proof system of keeping track of file prominence in our clean bucket.

under the data-tree/reanalysis/STREAM/VERSION i will be version controlling files like

* README.md -- describes what is inside of this data directory

* CHANGELOG.md -- describes history and time when files where added or deleted from the bucket. The actual log files tagged with this time will be uploaded to the `s3://our-bucket/observations/reanalysis/STREAM/VERSION/log` directory.

* yaml files that were used to control the the `s3_copy.py` utility in this repo.

* any shell scripts that were used to remove wrong files or rename files in the clean bucket.

IN the future we should do a better job of keep track of the files histories. but i think that might work for now.

I wonder if there is a way to some how automate this. Meaning, every time a file is updated (via some external application) a master file/database is updated that contains the location from which the original file was collected (i.e., did it come from NOAA HPSS, the GDAS online archive, etc.,), the corresponding MD5 checksum (this will make it easier for users to reproduce experiments within the UFS-RNR framework), and the date it was replaced.

Perhaps you have already considered this. However, such a database may be an option to keep a master record. I am just brainstorming so feel free to punt this idea.