NOAA-PSL / observation-archive

Tools to assemble observational archive record
Apache License 2.0
0 stars 0 forks source link

migrate iodav2 data into its own folder #55

Open jrknezha opened 1 year ago

jrknezha commented 1 year ago

In preparation for the transition from iodav2 to iodav3, the existing iodav2 files need to be reorganized in order to keep the bucket "clean" and clearly organized. To do this, a new folder needs to be created for iodav2 where the v2 files currently reside and the current files relocated into this new folder.

Example existing structure: observations/reanalysis/adt/nesdis/cryosat2/YYYY/MM/24h/*.iodav2.nc

Example new structure: observations/reanlaysis/adt/nesdis/cryosat2/YYYY/MM/24h/iodav2/*.iodav2.nc

To accomplish this, the existing src/s3-copy.py python script should be utilized. The script takes in a YAML file for each observation source which defines the current file structure, desired file structure, and time period to iterate over. The script will move the files into the new folder based on the definition in each YAML file. After creating the YAML files, the s3-copy.py script will be run against each of them to complete the transfer.

Data locations to migrate:

YAML script details: YAML scripts are stored under data-tree/reanalysis/ in the same observation source structure as the clean bucket along with changelogs to keep track of changes made to the clean bucket files. A template starting YAML file can be found in data-tree/reanalysis/date-type-template/revision/20100501-20210101.clean_bucket.gdas_first.yaml. The ocean data will require a different cycling_interval than the example script.

YAML values will need to reflect each individual observation sources date_range, source key and file_template, and destination path and file_template. The difference between source key and destination path will be the addition of the "/iodav2/" folder.

A different YAML file will be needed per sub-folder, i.e. adt/nesdis/cryosat2/ and adt/nesdis/ers1/ will require two different YAML files.

Note that the date_range end value needs to be the time value just after the last data point, it is not inclusive. i.e. if the last data is 20211231T00Z then the end value would need to be 20220101T00Z to include the last file in the transfer.