glamod / glamod-nuim

NUIM code in support of GLAMOD
BSD 3-Clause "New" or "Revised" License
0 stars 1 forks source link

Daily updates from ghcnd #45

Closed sjnoone closed 2 years ago

sjnoone commented 2 years ago

I have uploaded the code for the update of the daily files. The code downloads the daily diff file from ghcnd and then converts the insert.csv (new daily values) to the CDM formatted tables and saves in level 2 [/gws/nopw/j04/c3s311a_lot2/data/level2/land/daily_updates/]. The second peice of code deletes the unwanted processed .gz file once processed. I have tested the code on JASMIN and it runs fine [/gws/nopw/j04/c3s311a_lot2/data/level0/land/daily_data_processing/ghcnd_diff_updates/code/]. I want to set up the CRON job to run these jobs daily and stop using AK conversin codes. So before I do that can you please check over the code and make sure I have'nt missed an error?

rjhd2 commented 2 years ago

Note - currently not using approach in updated daily CDM conversion scripts (#37). Include these dictionaries when refactoring.

(code in PYTHON_CDM_Conversion_code/daily_updates_processing)

rjhd2 commented 2 years ago

The bit of code which identifies the most recent daily diff file - that currently spins through all files on the FTP server, so takes a while. Is it just wanting the most recent file (so we could take that as the end one in a sorted list, and ignore the time aspect), or is it wanting the files which have appeared on the server since the most recent run of the script?

sjnoone commented 2 years ago

It searches for the most recent daily_diff file on the ftp server and then checks to see if it exists using last_file.txt if it is new it convrets if not it ends. Sometimes it doesnt produce a new file every day and ometime sgroups sevreal days together to catch up.

rjhd2 commented 2 years ago

Cool, so just wanting the most recent. Could use the naming convention which has a YYYYMMDD in it to pick up the most recent (presuming that there's no strange out of sequence infilling).

Would there ever been the reason to get a few daily diff files at once?

sjnoone commented 2 years ago

No , always only one diff file

rjhd2 commented 2 years ago

Not sure whether to further refactor and move each stage into a separate def() - might just leave for the moment.

Could move this up one directory level (and save the tweak to ensure import utils works) as filenames are descriptive enough.

Does the delete.csv file need processing at all?

sjnoone commented 2 years ago

the delete.csv removes the processed insert.csv file so we dont have issues with overwriting etc.

Thanks for doing these edits.

rjhd2 commented 2 years ago

The delete......py file does the delete - that makes sense. I just wanted to make sure that the delete.csv which is in the superghcnd...tar.gz file doesn't need processing - as I presumed it removes observations that had been included but are no longer part of the record, and by extension the update.csv would update values in the record.

sjnoone commented 2 years ago

Ah ok i understand what you mean, no the delete.csv and update.csv are not processed . The delete.py removed the processed .csv files.

rjhd2 commented 2 years ago

Will leave in current directory and close issue via PR. Any errors to be raised as new issues in due course.