Closed sjnoone closed 2 years ago
Note - currently not using approach in updated daily CDM conversion scripts (#37). Include these dictionaries when refactoring.
(code in PYTHON_CDM_Conversion_code/daily_updates_processing)
The bit of code which identifies the most recent daily diff file - that currently spins through all files on the FTP server, so takes a while. Is it just wanting the most recent file (so we could take that as the end one in a sorted list, and ignore the time aspect), or is it wanting the files which have appeared on the server since the most recent run of the script?
It searches for the most recent daily_diff file on the ftp server and then checks to see if it exists using last_file.txt if it is new it convrets if not it ends. Sometimes it doesnt produce a new file every day and ometime sgroups sevreal days together to catch up.
Cool, so just wanting the most recent. Could use the naming convention which has a YYYYMMDD in it to pick up the most recent (presuming that there's no strange out of sequence infilling).
Would there ever been the reason to get a few daily diff files at once?
No , always only one diff file
Not sure whether to further refactor and move each stage into a separate def()
- might just leave for the moment.
Could move this up one directory level (and save the tweak to ensure import utils
works) as filenames are descriptive enough.
Does the delete.csv
file need processing at all?
the delete.csv removes the processed insert.csv file so we dont have issues with overwriting etc.
Thanks for doing these edits.
The delete......py
file does the delete - that makes sense. I just wanted to make sure that the delete.csv
which is in the superghcnd...tar.gz
file doesn't need processing - as I presumed it removes observations that had been included but are no longer part of the record, and by extension the update.csv
would update values in the record.
Ah ok i understand what you mean, no the delete.csv and update.csv are not processed . The delete.py removed the processed .csv files.
Will leave in current directory and close issue via PR. Any errors to be raised as new issues in due course.
I have uploaded the code for the update of the daily files. The code downloads the daily diff file from ghcnd and then converts the insert.csv (new daily values) to the CDM formatted tables and saves in level 2 [/gws/nopw/j04/c3s311a_lot2/data/level2/land/daily_updates/]. The second peice of code deletes the unwanted processed .gz file once processed. I have tested the code on JASMIN and it runs fine [/gws/nopw/j04/c3s311a_lot2/data/level0/land/daily_data_processing/ghcnd_diff_updates/code/]. I want to set up the CRON job to run these jobs daily and stop using AK conversin codes. So before I do that can you please check over the code and make sure I have'nt missed an error?