Open-EO / openeo-geotrellis-extensions

Java/Scala extensions for Geotrellis, for use with OpenEO GeoPySpark backend.
Apache License 2.0
5 stars 3 forks source link

load_stac to stac.openeo.vito.be: 429 Too Many Requests #299

Closed EmileSonneveld closed 2 weeks ago

EmileSonneveld commented 1 month ago

The following snippet triggers a 429 error. A backoff strategy or polling mechanism could avoid that

import openeo

url = "https://openeo.dataspace.copernicus.eu"
connection = openeo.connect(url).authenticate_oidc()

EXTENT = dict(zip(["west", "south", "east", "north"],
                  [5.318868004541495, 50.628576059801816, 5.3334400271343725, 50.637843899562576]))
EXTENT['crs'] = "EPSG:4326"
STARTDATE = '2022-01-01'
ENDDATE = '2022-03-31'

meteo_cube = connection.load_stac("https://stac.openeo.vito.be/collections/agera5_daily", spatial_extent=EXTENT,
                                  temporal_extent=[STARTDATE, ENDDATE], bands=["2m_temperature_mean"])
job = meteo_cube.create_job()
job.start_and_wait()
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.openeo.geotrellis.geotiff.package.saveRDDTemporal. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 18 in stage 4.0 failed 4 times, most recent failure: Lost task 18.3 in stage 4.0 (TID 119) (10.42.110.101 executor 2): scalaj.http.HttpStatusException: 429 Error: HTTP/1.0 429 Too Many Requests
    at scalaj.http.HttpResponse.throwIf(Http.scala:156)
    at scalaj.http.HttpResponse.throwError(Http.scala:168)
    at org.openeo.geotrellis.CustomizableHttpRangeReader.totalLength$lzycompute(CustomizableHttpRangeReader.scala:22)
    at org.openeo.geotrellis.CustomizableHttpRangeReader.totalLength(CustomizableHttpRangeReader.scala:10)
    at geotrellis.util.StreamingByteReader.ensureChunk(StreamingByteReader.scala:109)
    at geotrellis.util.StreamingByteReader.get(StreamingByteReader.scala:130)
    at geotrellis.raster.io.geotiff.reader.GeoTiffInfo$.read(GeoTiffInfo.scala:127)
    at geotrellis.raster.io.geotiff.reader.GeoTiffReader$.readMultiband(GeoTiffReader.scala:211)
    at geotrellis.raster.geotiff.GeoTiffReprojectRasterSource.$anonfun$tiff$1(GeoTiffReprojectRasterSource.scala:46)
    at scala.Option.getOrElse(Option.scala:189)
    at geotrellis.raster.geotiff.GeoTiffReprojectRasterSource.tiff$lzycompute(GeoTiffReprojectRasterSource.scala:43)
    at geotrellis.raster.geotiff.GeoTiffReprojectRasterSource.tiff(GeoTiffReprojectRasterSource.scala:40)
EmileSonneveld commented 1 month ago

The request requires credentials. To quickly inspect the error, you can paste the following snippet in the console of https://services.terrascope.be/

function badGet(url) {
      let res = null;
      const xhttp = new XMLHttpRequest();
      xhttp.onreadystatechange = function () {
       if (xhttp.readyState === 4)
       {
         if(xhttp.status < 200 || xhttp.status >= 300){
           document.body.append("badGet: " + xhttp.message);
           throw new Error("badGet: " + xhttp.message);
         }
       }
     }
     xhttp.open("GET", url);
     xhttp.send();

     return res;
   }

for(var i=0; i<100; i++){
    badGet("https://services.terrascope.be/download/AgERA5/2024/20240418/AgERA5_dewpoint-temperature_20240418.tif")
}
EmileSonneveld commented 1 month ago

This works on https://openeo-staging.dataspace.copernicus.eu/ now. The example process graph encountered 30 times a 429 error, but got trough after retrying

bossie commented 1 month ago

BTW for retries you can use Failsafe; the SHub module also uses it: https://github.com/Open-EO/openeo-geotrellis-extensions/blob/ba6b104cf546523e8a7dd4ed8b4df939d34e2d95/geotrellis-sentinelhub/src/main/scala/org/openeo/geotrellissentinelhub/ProcessApi.scala#L61-L90

if you're not taking a Retry-After header into account this should become a great deal simpler.

JeroenVerstraelen commented 3 weeks ago

@bossie review

EmileSonneveld commented 2 weeks ago

Got this issue when running on http://openeo.vito.be. The last MR is not depoyed there yet, but I'd like to keep this ticket open till the cause is found. (Also with this fix deployed on http://openeo-dev.vito.be, this error occurs) java.io.IOException: Exception while determining data type of asset https://services.terrascope.be/download/AgERA5/2022/20220330/AgERA5_temperature-mean_20220330.tif in collection https://stac.openeo.vito.be/collections/agera5_daily. Detailed message: requirement failed: Server doesn't support ranged byte reads job_id: j-240628ca5e454cabbb16b64caacb2e8d

EmileSonneveld commented 2 weeks ago

Nevermind, it works on https://openeo-staging.dataspace.copernicus.eu and does not need to work on Terrascope, as the the native AGERA5 layer is available there.