bcgov / nr-rfc-climate-obs

Transition of the existiing climate observations data pipeline to enable running off prem
Apache License 2.0
1 stars 0 forks source link

Address ECCC version data in Object Storage #45

Open franTarkenton opened 1 year ago

franTarkenton commented 1 year ago

Background

The ECCC script pulls hourly data from the federal governments data mart, does some reformatting and ultimately creates the files in the object storage bucket into the following directory: RFC_DATA/ECCC/hourly/csv

The script is currently running every hour. Each time it runs it creates a new version in object storage.

Task

Modify the ECCC code and update it so that there can only ever be two versions. If there are more than two versions the oldest ones are autodeleted.

The best place to implement this is the upstream nr-objectstore-util lib. Configure it so that there is an argument for the put operations that defines the maximum number of versions you want to maintain. If not populated then doesn't do anything, and just creates a new version, however if you specify an arguement of version=2 then it will delete any versions that are older than the 2 newest ones.