ESPRI-Mod / synda

ESGF Downloader (this is a deprecated repository, the tool has now moved to https://github.com/ESGF/esgf-download)
https://espri-mod.github.io/synda/
21 stars 11 forks source link

Save file transfer error histories for further processing #169

Open painter1 opened 3 years ago

painter1 commented 3 years ago

When a file download fails, preserve a dated and abbreviated error description in a new column error_history. This gives external scripts the information needed to deal with incorrect urls or bad checksums.

For large-scale replication, it is impossible to manually deal with problems affecting individual files, such as an incorrect checksum reported by the data node. The Synda database as LLNL has over 16,000 files with bad checksums and over 260,000 files which have been needed some kind of individual attention. To do this, we need (and I have) an external script.

Usually a file should be made to "disappear" if it cannot be downloaded after repeated attempts, and if the transfer failures are for reasons specific to that file. By "disappear" I mean that there will be no more attempts to transfer it, even after any possible "synda install..." or "synda retry" command. An easy way to make that happen is to change the database so that the file has a nonstandard status such as "bad_checksum". In order to do this, you have to know how many transfer attempts there have been, and when. The changes in this pull request will add this information to the database.

For an example of such an external script, see https://github.com/painter1/Synda-scripts/blob/fad3c1bb5d131bc0a5bb79d1467d789fd4cc28d1/permanent_error_status.py

painter1 commented 3 years ago

This is more useful with my pull request #147, but should be included with or without it.