Open-EO / openeo-geotrellis-extensions

Java/Scala extensions for Geotrellis, for use with OpenEO GeoPySpark backend.
Apache License 2.0
5 stars 3 forks source link

load_stac to 429 Too Many Requests #299

Closed EmileSonneveld closed 2 weeks ago

EmileSonneveld commented 1 month ago

The following snippet triggers a 429 error. A backoff strategy or polling mechanism could avoid that

import openeo

url = ""
connection = openeo.connect(url).authenticate_oidc()

EXTENT = dict(zip(["west", "south", "east", "north"],
                  [5.318868004541495, 50.628576059801816, 5.3334400271343725, 50.637843899562576]))
EXTENT['crs'] = "EPSG:4326"
STARTDATE = '2022-01-01'
ENDDATE = '2022-03-31'

meteo_cube = connection.load_stac("", spatial_extent=EXTENT,
                                  temporal_extent=[STARTDATE, ENDDATE], bands=["2m_temperature_mean"])
job = meteo_cube.create_job()
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.openeo.geotrellis.geotiff.package.saveRDDTemporal. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 18 in stage 4.0 failed 4 times, most recent failure: Lost task 18.3 in stage 4.0 (TID 119) ( executor 2): scalaj.http.HttpStatusException: 429 Error: HTTP/1.0 429 Too Many Requests
    at scalaj.http.HttpResponse.throwIf(Http.scala:156)
    at scalaj.http.HttpResponse.throwError(Http.scala:168)
    at org.openeo.geotrellis.CustomizableHttpRangeReader.totalLength$lzycompute(CustomizableHttpRangeReader.scala:22)
    at org.openeo.geotrellis.CustomizableHttpRangeReader.totalLength(CustomizableHttpRangeReader.scala:10)
    at geotrellis.util.StreamingByteReader.ensureChunk(StreamingByteReader.scala:109)
    at geotrellis.util.StreamingByteReader.get(StreamingByteReader.scala:130)
    at geotrellis.raster.geotiff.GeoTiffReprojectRasterSource.$anonfun$tiff$1(GeoTiffReprojectRasterSource.scala:46)
    at scala.Option.getOrElse(Option.scala:189)
    at geotrellis.raster.geotiff.GeoTiffReprojectRasterSource.tiff$lzycompute(GeoTiffReprojectRasterSource.scala:43)
    at geotrellis.raster.geotiff.GeoTiffReprojectRasterSource.tiff(GeoTiffReprojectRasterSource.scala:40)
EmileSonneveld commented 1 month ago

The request requires credentials. To quickly inspect the error, you can paste the following snippet in the console of

function badGet(url) {
      let res = null;
      const xhttp = new XMLHttpRequest();
      xhttp.onreadystatechange = function () {
       if (xhttp.readyState === 4)
         if(xhttp.status < 200 || xhttp.status >= 300){
           document.body.append("badGet: " + xhttp.message);
           throw new Error("badGet: " + xhttp.message);
     }"GET", url);

     return res;

for(var i=0; i<100; i++){
EmileSonneveld commented 1 month ago

This works on now. The example process graph encountered 30 times a 429 error, but got trough after retrying

bossie commented 1 month ago

BTW for retries you can use Failsafe; the SHub module also uses it:

if you're not taking a Retry-After header into account this should become a great deal simpler.

JeroenVerstraelen commented 3 weeks ago

@bossie review

EmileSonneveld commented 2 weeks ago

Got this issue when running on The last MR is not depoyed there yet, but I'd like to keep this ticket open till the cause is found. (Also with this fix deployed on, this error occurs) Exception while determining data type of asset in collection Detailed message: requirement failed: Server doesn't support ranged byte reads job_id: j-240628ca5e454cabbb16b64caacb2e8d

EmileSonneveld commented 2 weeks ago

Nevermind, it works on and does not need to work on Terrascope, as the the native AGERA5 layer is available there.