SimonFisher92 / Scottish_Snow

2 stars 0 forks source link

Downloading large number of offline products fails #20

Closed murraycutforth closed 8 months ago

murraycutforth commented 9 months ago

Related to #19

So I attempted to set off a long download using this command:

python -m src.download.main --data_dir='/media/murray/BE10-C259/data/Scottish_Snow' --geojson_path='input/cairngorms_footprint.geojson' --product_filter='*_[12]0m.jp2' --target_tile='T30VVJ' --api_user="" --api_password=""

(the filter is for 10/20m resolutions only, but I think I'll skip that next time since the 60m resultion data doesn't take much space anyway)

Some products were downloaded, and some were triggered from retrieval from the LTA, but eventually the code ended up stuck with this error:

[2023-10-24 12:05:04,476] [src.download.download] [ERROR]: Download failed with expected str, bytes or os.PathLike object, not function. Retrying in 30mins.
[2023-10-24 12:05:05,270] [urllib3.connectionpool] [DEBUG]: https://apihub.copernicus.eu:443 "GET /apihub/odata/v1/Products('86483745-e808-462e-98e6-5747e23bf68f')?$format=json HTTP/1.1" 200 None
[2023-10-24 12:05:05,271] [sentinelsat.SentinelAPI] [DEBUG]: Manifest file already available (/media/murray/BE10-C259/data/Scottish_Snow/S2B_MSIL2A_20230604T114349_N0509_R123_T30VVJ_20230604T124114.SAFE/manifest.safe), skipping download

I'm not sure why this happens, or what the error message actually means (given that at first data is downloaded correctly). I'm going to experiment with removing the thread pool (used to concurrently call api.download(prod_id)), and try and slow it all down to avoid triggering the user quota (I have read on a forum the limit is 20 LTA retrieval requests per 12 hours).

murraycutforth commented 9 months ago

Running with the following command seems to work now, albeit very slowly. This is on the downloader branch.

python -m src.download.main --data_dir='/media/murray/BE10-C259/data/Scottish_Snow' --geojson_path='input/cairngorms_footprint.geojson' --product_filter='' --num_threads=0 --target_tile='T30VVJ' --api_user="" --api_password="" --max_cloud_cover=50

At a rate of 1 retrieval per hour this will take on the order of 2 weeks to get all 355 tiles for the cairngorms with less than 50% cloud cover. Not the end of the world. I'm just downloading everything (not filtering to any band or resolution). The built-in caching in sentinel sat seems to work okay in this case (@ipoole ), I've stopped and manually restarted this command.

murraycutforth commented 9 months ago

Something else I noticed is that if you don't use a path filter, then the product is downloaded as a single zip file, rather than the SAFE directory format.