centreborelli / tsd

Time Series Downloader for satellite images
GNU Affero General Public License v3.0
39 stars 19 forks source link

rasterio.errors.RasterioIOError: HTTP response code: 404 #25

Open chlsl opened 4 years ago

chlsl commented 4 years ago

After I upgraded tsd to the last commit 78b0ccae02e8f4b7d9cb535e56ce27199cb2f24f, I got the following error:

$ python3 ../../../../tsd/tsd/get_sentinel2.py --lat 49.037670 --lon 3.942441 --start-date 2016-09-05 --end-date 2017-09-15 --band B02 B03 B04 B08 --api scihub --product-type L1C -o .
Found 89 images
Building 89 gcloud download urls... 89 / 89
CPLReleaseMutex: Error = 1 (Operation not permitted)
Downloading 356 crops (89 images with 4 bands)... Traceback (most recent call last):
  File "rasterio/_base.pyx", line 216, in rasterio._base.DatasetBase.__init__
  File "rasterio/_shim.pyx", line 67, in rasterio._shim.open_dataset
  File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_HttpResponseError: HTTP response code: 404

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "../../../../tsd/tsd/get_sentinel2.py", line 366, in <module>
    parallel_downloads=args.parallel_downloads)
  File "../../../../tsd/tsd/get_sentinel2.py", line 292, in get_time_series
    download(images, bands, aoi, mirror, out_dir, parallel_downloads)
  File "../../../../tsd/tsd/get_sentinel2.py", line 171, in download
    nb_workers=parallel_downloads)
  File "/home/hessel/kayrros/test/tsd/tsd/parallel.py", line 75, in run_calls
    outputs.append(r.get(timeout))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/hessel/kayrros/test/tsd/tsd/utils.py", line 194, in rasterio_geo_crop
    with rasterio.open(inpath) as src:
  File "/home/hessel/.local/lib/python3.6/site-packages/rasterio/env.py", line 445, in wrapper
    return f(*args, **kwds)
  File "/home/hessel/.local/lib/python3.6/site-packages/rasterio/__init__.py", line 219, in open
    s = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
  File "rasterio/_base.pyx", line 218, in rasterio._base.DatasetBase.__init__
rasterio.errors.RasterioIOError: HTTP response code: 404

This was introduced at commit 5c3c7ad87e2340d35e97092ece8055c3941c2cdc. Before the commit the same command downloads the images all right:

$ python3 ../../../../tsd/tsd/get_sentinel2.py --lat 49.037670 --lon 3.942441 --start-date 2016-09-05 --end-date 2017-09-15 --band B02 B03 B04 B08 --api scihub --product-type L1C -o .
Found 89 images
Building 89 gcloud download urls... 89 / 89
Downloading 356 crops (89 images with 4 bands)... https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/31/U/EQ/S2A_MSIL1C_20161204T104422_N0204_R008_T31UEQ_20161204T104538.SAFE/GRANULE/L1C_T31UEQ_A007584_20161204T104538/IMG_DATA/T31UEQ_20161204T104422_B02.jp2 is not available
https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/31/U/EQ/S2A_MSIL1C_20161204T104422_N0204_R008_T31UEQ_20161204T104538.SAFE/GRANULE/L1C_T31UEQ_A007584_20161204T104538/IMG_DATA/T31UEQ_20161204T104422_B04.jp2 is not available
https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/31/U/EQ/S2A_MSIL1C_20161204T104422_N0204_R008_T31UEQ_20161204T104538.SAFE/GRANULE/L1C_T31UEQ_A007584_20161204T104538/IMG_DATA/T31UEQ_20161204T104422_B03.jp2 is not available
https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/31/U/EQ/S2A_MSIL1C_20161204T104422_N0204_R008_T31UEQ_20161204T104538.SAFE/GRANULE/L1C_T31UEQ_A007584_20161204T104538/IMG_DATA/T31UEQ_20161204T104422_B08.jp2 is not available
https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/31/U/EQ/S2B_MSIL1C_20170816T104019_N0205_R008_T31UEQ_20170816T104714.SAFE/GRANULE/L1C_T31UEQ_A002322_20170816T104019/IMG_DATA/T31UEQ_20170816T104019_B02.jp2 is not available
https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/31/U/EQ/S2B_MSIL1C_20170816T104019_N0205_R008_T31UEQ_20170816T104714.SAFE/GRANULE/L1C_T31UEQ_A002322_20170816T104019/IMG_DATA/T31UEQ_20170816T104019_B03.jp2 is not available
https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/31/U/EQ/S2B_MSIL1C_20170816T104019_N0205_R008_T31UEQ_20170816T104714.SAFE/GRANULE/L1C_T31UEQ_A002322_20170816T104019/IMG_DATA/T31UEQ_20170816T104019_B04.jp2 is not available
https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/31/U/EQ/S2B_MSIL1C_20170816T104019_N0205_R008_T31UEQ_20170816T104714.SAFE/GRANULE/L1C_T31UEQ_A002322_20170816T104019/IMG_DATA/T31UEQ_20170816T104019_B08.jp2 is not available
356 / 356

This error is not systematic, so I suppose it has something to do with these images that are not available.

carlodef commented 4 years ago

I observe the same error every now and then. I have no idea what the CPLReleaseMutex error is about.

anttad commented 4 years ago

Hi,

I also have this kind of error while trying to download some images.

I first tried this command :

python get_sentinel2.py --geom=./lieu.json -s 2019-01-01 -e 2019-12-01 -o test_lieu_1 

to download the images of the location in lieu.json over the past 11 months.

tsd downloaded every image from 2019-01-01 to 2019-02-27 included before throwing this http 404 error.

I then wanted if see if it could manage to download the rest of the images starting from 2019-02-27 with :

python get_sentinel2.py --geom=./lieu.json -s 2019-02-27 -e 2019-12-01 -o test_lieu_2

And it downloaded the images between 2019-02-27 and 2019-09-25 before throwing this error again.

carlodef commented 4 years ago

Thank you @anttad, as @chlsl observed that this bug was introduced in commit 5c3c7ad, could you please retry your command with the parent commit?

git checkout 87d7ab6d03d625a24b7068615df869cc8ba34cc4 should do the job.

glostis commented 4 years ago

@anttad and @chlsl do you have recent versions of rasterio installed? Are they wheel-based installs, or source distributions where your local copy of GDAL is used?

If you have rasterio < 1.1.1, could you try upgrading it (and check that it installs from the wheel)? Looking at rasterio's changelog, it seems that the most recent version of rasterio includes a GDAL 2.4.3 version that has a patch for an issue with multithreading reported here: https://github.com/mapbox/rasterio/issues/1828

The error described in this issue is different from the one that you observe, but I have a suspicion your issue could be linked to multithreading somehow.

anttad commented 4 years ago

@carlodef Thank you it works fine with this version of the code.

@glostis I have currently rasterio version 1.1.1 installed.

chlsl commented 4 years ago

@glostis I have rasterio version 1.1.1 too. I'm using Ubuntu 18.04 LTS.

carlodef commented 4 years ago

This should be fixed by commit 02ad42f3c. With this commit the error doesn't show up anymore on my machine. @chlsl and @anttad, could you please check if this solves the issue for you as well?

chlsl commented 4 years ago

@carlodef, I tried again the command:

python3 get_sentinel2.py --lat 49.037670 --lon 3.942441 --start-date 2016-09-05 --end-date 2017-09-15 --band B02 B03 B04 B08 --api scihub --product-type L1C -o tmp

and it succeeded! However the CPLReleaseMutex: Error = 1 (Operation not permitted) appeared in one of my three tests.

I use Ubuntu 18.04 LTS, and rasterio 1.1.3.

However, I tried the following command a few times:

python3 get_sentinel2.py --lat 40.138326 --lon -105.064174 --start-date 2019-05-01 --band B02 -o tmp

and unfortunately it succeeded only once. Worst, the errors are not always the same. I'll recap the results below. The two first messages Found 150 images and Building 150 gcloud download urls... 150 / 150 are always printed so I omitted them in the list.

    • Downloading 150 crops (150 images with 1 bands)... rasterio._err.CPLE_AppDefinedError: TIFFReadDirectory:Failed to read directory at offset 8 (full error below)
    • program exited with error code 1
  1. same as (1).
    • CPLReleaseMutex: Error = 1 (Operation not permitted)
    • Downloading 150 crops (150 images with 1 bands)... rasterio._err.CPLE_AppDefinedError: TIFFReadDirectory:Failed to read directory at offset 8 (full error below)
    • program exited with error code 1
    • Downloading 150 crops (150 images with 1 bands)... Inconsistency detected by ld.so: ../elf/dl-tls.c: 481: _dl_allocate_tls_init: Assertion 'listp->slotinfo[cnt].gen <= GL(dl_tls_generation)' failed!
    • program exited with error code 127
  2. same as (3).
    • CPLReleaseMutex: Error = 1 (Operation not permitted)
    • Downloading 150 crops (150 images with 1 bands)... WARNING: download of https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/13/T/DE/S2B_MSIL1C_20200302T175139_N0209_R141_T13TDE_20200302T212150.SAFE/GRANULE/L1C_T13TDE_A015611_20200302T175138/IMG_DATA/T13TDE_20200302T175139_B02.jp2 failed
    • SUCCESS!
    • Downloading 150 crops (150 images with 1 bands)... WARNING: download of https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/13/T/DE/S2B_MSIL1C_20200302T175139_N0209_R141_T13TDE_20200302T212150.SAFE/GRANULE/L1C_T13TDE_A015611_20200302T175138/IMG_DATA/T13TDE_20200302T175139_B02.jp2 failed
    • free(): invalid pointer [2] 3340 abort (core dumped) python3 tsd/get_sentinel2.py --lat 40.138326 --lon -105.064174 --start-date
    • program exited with error code 134

The TIFFReadDirectory full error is:

Downloading 150 crops (150 images with 1 bands)... Traceback (most recent call last):
  File "tsd/get_sentinel2.py", line 381, in <module>
    satellite_angles=args.satellite_angles)
  File "tsd/get_sentinel2.py", line 299, in get_time_series
    download(images, bands, aoi, mirror, out_dir, parallel_downloads)
  File "tsd/get_sentinel2.py", line 175, in download
    nb_workers=parallel_downloads)
  File "/home/hessel/kayrros/tsd/tsd/parallel.py", line 75, in run_calls
    outputs.append(r.get(timeout))
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
    raise self._value
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/hessel/kayrros/tsd/tsd/utils.py", line 239, in rasterio_geo_crop
    with rasterio.open(outpath, "w", **profile) as out:
  File "/home/hessel/.local/lib/python3.6/site-packages/rasterio/env.py", line 434, in wrapper
    return f(*args, **kwds)
  File "/home/hessel/.local/lib/python3.6/site-packages/rasterio/__init__.py", line 229, in open
    **kwargs)
  File "rasterio/_io.pyx", line 1154, in rasterio._io.DatasetWriterBase.__init__
  File "rasterio/_io.pyx", line 79, in rasterio._io._delete_dataset_if_exists
  File "rasterio/_err.pyx", line 205, in rasterio._err.exc_wrap_pointer
rasterio._err.CPLE_AppDefinedError: TIFFReadDirectory:Failed to read directory at offset 8

So except for the sixth attempt, the only constant thing is tsd failure. It seems to be caused by this CPLE_AppDefinedError, when it's not caused by this very strange error with ld.so (https://linux.die.net/man/8/ld-linux) (fourth attempt) or the memory issue (seventh attempt). The CPLReleaseMutex: Error = 1 (Operation not permitted) appears from time to time, like before.

Testing this same command without your patch 02ad42f (i.e. at commit d30a376), I got (omitting the Found 150 images and Building 150 gcloud download urls... 150 / 150):

    • Downloading 150 crops (150 images with 1 bands)... corrupted double-linked list [2] 19095 abort (core dumped) python3 tsd/get_sentinel2.py --lat 40.138326 --lon -105.064174 --start-date
    • program exited with error code 134
    • Downloading 150 crops (150 images with 1 bands)... [2] 26551 segmentation fault (core dumped) python3 tsd/get_sentinel2.py --lat 40.138326 --lon -105.064174 --start-date
    • program exited with error code 139
    • CPLReleaseMutex: Error = 1 (Operation not permitted)
    • Downloading 150 crops (150 images with 1 bands)... rasterio._err.CPLE_HttpResponseError: HTTP response code: 404 (same error as described at the beginning of the thread)
    • Downloading 150 crops (150 images with 1 bands)... rasterio._err.CPLE_HttpResponseError: HTTP response code: 404 (same error as described at the beginning of the thread)
  1. same as (3).

At commit 87d7ab6, the same command works well.

There are two rasterio issues that seem related:

I'll try again using rasterio 1.1.4.dev0.

chlsl commented 4 years ago

@carlodef I tried again after upgrading to rasterio 1.1.4. I used the wheel-based PyPI version rasterio-1.1.4-cp36-cp36m-manylinux1_x86_64.whl.

I tried five times. It succeeded three times. It froze the two other ones, while downloading the crops. I killed it after the counter was blocked for 10-15min. The CLReleaseMutex: Error = 1 (Operation not permitted) appeared four times.

anttad commented 4 years ago

@carlodef, I also re-tested 5 times the command that gave me the error :

python get_sentinel2.py --geom=./lieu.json -s 2019-01-01 -e 2019-12-01 -o test_lieu_1 

With 5 runs it succeed to download all the images except 2, but not having those files didn't interrupt the process as it used to.

Found 92 images
Building 92 gcloud download urls... 92 / 92
CPLReleaseMutex: Error = 1 (Operation not permitted)
Downloading 92 crops (92 images with 1 bands)... WARNING: download of https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/33/S/WB/S2B_MSIL1C_20190808T094039_N0208_R036_T33SWB_20190808T132403.SAFE/GRANULE/L1C_T33SWB_A012646_20190808T094038/IMG_DATA/T33SWB_20190808T094039_B04.jp2 failed
WARNING: download of https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/33/S/WB/S2B_MSIL1C_20191007T094029_N0208_R036_T33SWB_20191007T122449.SAFE/GRANULE/L1C_T33SWB_A013504_20191007T095023/IMG_DATA/T33SWB_20191007T094029_B04.jp2 failed
WARNING: download of https://storage.googleapis.com/gcp-public-data-sentinel-2/tiles/33/S/WB/S2A_MSIL1C_20191025T095101_N0208_R079_T33SWB_20191025T120308.SAFE/GRANULE/L1C_T33SWB_A022670_20191025T095100/IMG_DATA/T33SWB_20191025T095101_B04.jp2 failed
92 / 92

First run didn't have the CPLReleaseMutex: Error = 1 (Operation not permitted) but all the following had it.

I work with Ubuntu 18.04 LTS and rasterio 1.1.1 .