davidfrantz / force

Framework for Operational Radiometric Correction for Environmental monitoring
GNU General Public License v3.0
172 stars 50 forks source link

Error when downloading Landsat with force-level1-landsat search #262

Closed JariPekko closed 1 year ago

JariPekko commented 1 year ago

Hi! I get the following error message when downloading Landsat images from USGS with force-level1-landsat search and i don't know if it's a problem.

Error message

Exception in thread Thread-3:
Traceback (most recent call last):
  File "/usr/lib/python3.8/threading.py", line 932, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.8/threading.py", line 870, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 592, in _handle_results
    cache[job]._set(i, obj)
  File "/usr/lib/python3.8/multiprocessing/pool.py", line 776, in _set
    self._callback(self._value)
  File "/usr/local/lib/python3.8/dist-packages/landsatlinks/download.py", line 79, in callback
    create_force_queue(url, output_dir, queue_fp)
  File "/usr/local/lib/python3.8/dist-packages/landsatlinks/download.py", line 44, in create_force_queue
    scene_name = f'{re.search(utils.PRODUCT_ID_REGEX, url).group(0)}.tar'
AttributeError: 'NoneType' object has no attribute 'group'

I used the following command: dforce force-level1-landsat search Landsat_tiles.txt images/ --cloudcover 0,70 --queue-file queue.txt --secret usgs_m2m_access.txt --download

Behaviour The download starts as expected and images are downloaded. After a few minutes the error message from above appears but the process is not aborted and the download continues.

Setup FORCE version 3.7.10 using Docker Ubuntu 20.04.5 LTS Linux Server 500G RAM, 80 CPUs

Question Do i have to worry? Is it just a warning that an URL didn't work?

ernstste commented 1 year ago

Hi Jari,

the error occurs when trying to add the scene that was just downloaded to the QUEUE file, so you probably don't need to worry about the download itself. It's not easy to say what the issue is with the information at hand. Does this only happen once? Can you specify which scene it was?

Thanks, Stefan

JariPekko commented 1 year ago

Hi Stefan,

thanks for the quick reply. So the download happens correctly but the QUEUE file is not updated correctly? I just counted the files in the download dir (6129) and the lines of the QUEUE file (5567) in case this is heplful.

I think so far the error happened only once each time i started the process. I don't know which scene caused it but i can give you all scenes i'm trying to download.

Sensor(s): TM, ETM, OLI
Tile(s): 171074,172074,172075,173074,173075,174074,174075,175074,175075,176073,176074,176075,177070,177071,177072,177073,177074,177075,178070,178071,178072,178073,178074,178075,179070,179072,179073,179074,179075,180072,180073
Date range: 1970-01-01 to 2023-01-04
Included months: 1,2,3,4,5,6,7,8,9,10,11,12
Cloud cover: 0% to 70%

20793 Landsat Level 1 scenes matching criteria found
10.97 TB data volume found
5850 product bundles found in output directory, 14943 not downloaded yet.
Remaining download size: 9.78 TB
Downloading:   1%|=>                                                     | 102/14909 [18:55<42:35:28, 10.36s/product bundle]
Downloading:   1%|==                                                     | 152/14909 [27:11<30:28:13,  7.43s/product bundle]
davidfrantz commented 1 year ago

could this be a potential file conflict when parallelly downloading images?

ernstste commented 1 year ago

@JariPekko Thanks, it looks like there is definitely an issue with writing the file queue. Please make sure to create the queue for processing yourself before starting the Level 2 processing.

@davidfrantz There is potential for this to happen in the current version. However, according to the traceback the issue here is that the callback function (called after downloading a scene) isn't getting the url passed on properly.

ernstste commented 1 year ago

I have run several tests and was unfortunately not able to reproduce the issue.

However, the way that the force queue file is created has been reworked to make sure that there aren't conflicts due to parallel access of processes on the same file. Instead of using a callback, we now use multiprocessing.Queue and a dedicated process that listens for results of the other processes and writes the queue file.

@JariPekko maybe you can try to pull the latest davidfrantz/force:latest image and let us know if that solves your issue? Thanks!

JariPekko commented 1 year ago

I pulled the latest image used it without changing anything else and it seems to be working as intended now. The download is going for 1h now with no error.

A small update to the process before the latest davidfrantz/force:latest image: The downloads did stop completely at some point. The process was still going but no download for several hours. After aborting manually and starting again it was always the same pattern: Download works as intended -> after a short while the error message from above appears but download is still ongoing ->some time later the download stops

Thanks, and i'll post an update about how it went

JariPekko commented 1 year ago

I'm happy to report that the download went flawlessly and rather quickly. In one day the ~11TB were downloaded.

Though the QUEUE file didn't seem to update at all. I downloaded 20792 scenes (as requested minus 1) but the QUEUE file had only 5771 lines, which it had before using the new davidfrantz/force:latest image. The QUEUE file had to be written manually afterwards.

Thanks a lot for your quick help!!

ernstste commented 1 year ago

Thanks for the feedback Jari!

I also noticed that the download speed has improved by orders of magnitude. I hope there have been changes to the infrastructure and it will stay like this now.

Glad to hear the issue is solved! To be honest I'm a bit puzzled that the queue file wasn't updated in your case. This was tested successfully here and I also had someone contact me in private with the same issue who had no issues writing the queue file after the update. Was the file maybe locked by another process by any chance?

JariPekko commented 1 year ago

I'm new to Linux and may be overlooking something, but I can't think of another process that would have locked the QUEUE file. I stopped the download (force-level1-landsat search --download) process from before the update. The only other command i did involving the QUEUE file was to count its lines sometimes wc -l queue.txt.

For testing purposes i just downloaded another scene with a new QUEUE file and new directories. Now the QUEUE file was updated correctly.

ernstste commented 1 year ago

Good to hear, thanks.

Leaving the commit for reference and closing this as completed. Feel free to re-open if needed.