emissions-api / sentinel5dl

Sentinel-5(P) Downloader
https://sentinel5dl.emissions-api.org
MIT License
12 stars 8 forks source link

md5 sum is not download on first attempt #67

Closed shaardie closed 2 years ago

shaardie commented 4 years ago

It seems like the md5 hash of a file is not downloaded the first time the file is downloaded. Due to the program logic the md5 hash is first downloaded when the file already exists and there is an attempt to download it again, see https://github.com/emissions-api/sentinel5dl/blob/master/sentinel5dl/__init__.py#L223

This is not the behavior I would expect.

lkiesow commented 4 years ago

This is intentional since it's not necessary for the first download. It's just used when checking that a local file is the expected one, something we already know if we successully just downloaded it.

shaardie commented 4 years ago

Okay. It is still a little bit counter intuitive to me that I am not able to verify the integrity of the downloaded file after the first download without downloading another file.

I thought I would be able to download the data of a year e.g. in chunks of months and after that be able to start a new download for the whole year which could verify very fast that I had downloaded all files correctly or simply download the files missing files, but when the second download had to download all missing md5 hashes it is still very slow.

But I see the point that the first download does not need the hash file and that it should be faster without downloading it.

So please close this issue again, if you are happy the way it is now.

shaardie commented 4 years ago

Funnily, with a fast internet connection it seems like downloading the file again is faster that checking the md5 hash :laughing:. At least for CO.

shaardie commented 4 years ago

Also this raises another question to me:

Do we even need the hash file anymore? If we download it on the first attempt, we could argue that it is for download verification.

But if we only want to check if a file is already downloaded, we could simply check, if the file is present, since we are downloading it to a temporary file first. All files with proper filenames should be complete.

lkiesow commented 2 years ago

Downloading the offline versions on a system with a fast CPU and SSD means that checking the md5 is significantly faster. There are arguments for both, I guess, but what we have now works in production for over a year now → not investigating this any further.