davidfrantz / force

Framework for Operational Radiometric Correction for Environmental monitoring
GNU General Public License v3.0
173 stars 50 forks source link

force-level1-landsat does not download all the found scenes and fails to populate the queue #320

Open kelewinska opened 6 months ago

kelewinska commented 6 months ago

Hi, I am observing a peculiar behavior when executing the force-level1-landsat command. I run the following command

force-level1-landsat search --download /data/test3/aoi.txt -s OLI -d 20240101,20240131 --secret /home/user/secret.txt -q /data/test3/pool3.txt /data/test3

seeking Landsat OLI scenes acquired between in January 2024 for the 005-047 Landsat tile (Puerto Rico) (my aoi.txt comprises a single line of 005047).

The output is:

Sensor(s): OLI
Tile(s): 005047
Date range: 2024-01-01 to 2024-01-31
Included months: 1,2,3,4,5,6,7,8,9,10,11,12
Cloud cover: 0% to 100%

3 Landsat Level 1 scenes matching criteria found
3.14 GB data volume found
Downloading: 100%|=========================================================================================| 1/1 [00:57<00:00, 57.90s/product bundle]
Download complete

For some reason only one out of three found scenes is pulled down and added to the queue file.

Upon the second execution of exactly the same command two remaining scenes are downloaded but the queue file is not updated.

Sensor(s): OLI
Tile(s): 005047
Date range: 2024-01-01 to 2024-01-31
Included months: 1,2,3,4,5,6,7,8,9,10,11,12
Cloud cover: 0% to 100%

3 Landsat Level 1 scenes matching criteria found
3.14 GB data volume found
1 product bundles found in output directory, 2 not downloaded yet.
Remaining download size: 2.22 GB
Downloading: 100%|=========================================================================================| 2/2 [00:46<00:00, 23.34s/product bundle]
Download complete

Even if upon the second execution of the command, I point to a new queue file, the file is created but remains empty. The .tar archives are downloaded correctly and the data are not corrupted.

I run a test for Landsat tiles 200-028, and 103-073 (selected randomly). For the 200-028 the behavior was exactly as described above: a single scene was pulled upon the first run, and two remaining scenes were downloaded upon the second run, with the queue file not being updated upon the second run. for the 103-073 tile, however, the first execution resulted in zero scenes being downloaded:

Sensor(s): OLI
Tile(s): 103073
Date range: 2024-01-01 to 2024-01-31
Included months: 1,2,3,4,5,6,7,8,9,10,11,12
Cloud cover: 0% to 100%

2 Landsat Level 1 scenes matching criteria found
2.35 GB data volume found
Downloading: 0product bundle [00:00, ?product bundle/s]
Download complete

and only the second execution of the command downloaded the data. A queue file was generated, but remained empty.

Tile(s): 103073
Date range: 2024-01-01 to 2024-01-31
Included months: 1,2,3,4,5,6,7,8,9,10,11,12
Cloud cover: 0% to 100%

2 Landsat Level 1 scenes matching criteria found
2.35 GB data volume found
Downloading: 100%|=========================================================================================| 2/2 [00:39<00:00, 19.53s/product bundle]
Download complete

I ran force version v. 3.7.12 and tested it on two servers, one a computer-server running Ubuntu 20.04.6 LTS | Kernel: Linux 5.4.0-172-generic | Architecture: x86-64 and the other one a computer-server running Ubuntu 22.04.4 LTS | Kernel: Linux 5.15.0-91-generic | Architecture: x86-64. In both cases, force is executed in a docker container davidfrantz/force:latest (the image was pulled from the hub.docker.com, the name and tag correspond with the image pointer used in the docker pull command).

On both servers, I observe exactly the same behavior - to download the data i need to execute the command more than once, and the files pulled down on the second attempt are not added to the queue.

Furthermore, when I just ran

force-level1-landsat search /data/test3/aoi.txt -s OLI -d 20240101,20240131 --secret /home/user/secret.txt /data/test3

The download links were generated also only for one image when I ran it for the first time. The links for the two remaining files were created (in a new file) upon the second execution of the command.

Did anyone come across anything like this? Do you have any idea what might be the issue here, and how to troubleshoot it? It is easy to spot in cases when only a handful of images need to be downloaded, but becomes a pain for bigger data pulls and processing.

Thank you in advance! Kasia

davidfrantz commented 6 months ago

... huh

I never noticed this myself, but I can reproduce it 100%...

@ernstste, do you have some idea why this could happen?

Best, David

ernstste commented 6 months ago

I spent some time looking into this and it's a bit tricky. The way that the API responds to download-requests seems to have changed. The results are now in two different objects ('available downloads' and 'preparing downloads') and not all of the results contain the product id or any other identifier that would allow to derive what hides behind the URL.

Creating a fix that makes sure that we are getting all downloads with the first request was easy to implement. However, creating the queue files is a different story. I had already implemented a workaround but then had to realize that the API can change the order of the returned results for download-requests and we have no way of knowing what's going on. I'll have to look into another solution.

ernstste commented 6 months ago

Okay I think it should all be fixed now. Not exactly thrilled by the changes but it should work okay now. Would be great if you guys can confirm as I have very limited time for testing now.

ernstste commented 6 months ago

@davidfrantz did the credentials for dockerhub change? https://github.com/ernstste/landsatlinks/actions/runs/8663609892/job/23758090922

davidfrantz commented 6 months ago

Thanks so much Stefan!

No, i don't think so. It might be a problem with the depatch action. The signal didn't arrive in the base repository.

I will try to have a look next week

ernstste commented 6 months ago

Ah true, it's actually the credentials for the dispatch that seem to fail. I saw you updated the version of the dispatch action and did the same, but the result is unfortunately the same. Let me know if there is something that needs to be changed on my end once you had the chance to look into it next week.

davidfrantz commented 6 months ago

Ah true, it's actually the credentials for the dispatch that seem to fail. I saw you updated the version of the dispatch action and did the same, but the result is unfortunately the same. Let me know if there is something that needs to be changed on my end once you had the chance to look into it next week.

I cannot see anything obvious. I fear we need to sit together to solve this...

ernstste commented 6 months ago

Sure, feel free to give me a call and we'll see what we can do.