Open-EO / openeo-geotrellis-extensions

Java/Scala extensions for Geotrellis, for use with OpenEO GeoPySpark backend.
Apache License 2.0
5 stars 3 forks source link

fix writing png to object storage #174

Closed jdries closed 1 year ago

jdries commented 1 year ago

{
  "process_graph": {
    "save2": {
      "process_id": "save_result",
      "arguments": {
        "data": {
          "from_node": "load1"
        },
        "format": "PNG"
      },
      "result": true
    },
    "load1": {
      "process_id": "load_collection",
      "arguments": {
        "id": "SENTINEL2_L2A",
        "spatial_extent": {
          "west": 12.079270663717073,
          "east": 12.745766173810614,
          "south": 41.548842135427265,
          "north": 42.0767637444217
        },
        "temporal_extent": [
          "2023-05-30T00:00:00Z",
          null
        ],
        "bands": [
          "B04",
          "B03",
          "B02"
        ]
      }
    }
  },
  "parameters": []
}
OpenEO batch job failed: java.io.FileNotFoundException: s3:/OpenEO-data/batch_jobs/j-bfe5e7b6a8fe48dd9a5ee0cb0b1c1e51/out (No such file or directory)
Traceback (most recent call last):
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 1276, in <module>
    main(sys.argv)
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 1024, in main
    run_driver()
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 995, in run_driver
    run_job(
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/utils.py", line 52, in memory_logging_wrapper
    return function(*args, **kwargs)
  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 1128, in run_job
    the_assets_metadata = result.write_assets(str(output_file))
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/save_result.py", line 113, in write_assets
    return self.cube.write_assets(filename=directory, format=self.format, format_options=self.options)
  File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/geopysparkdatacube.py", line 1705, in write_assets
    get_jvm().org.openeo.geotrellis.png.package.saveStitched(max_level.srdd.rdd(), save_filename, crop_extent, png_options)
  File "/usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py", line 1322, in __call__
    return_value = get_return_value(
  File "/usr/local/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.openeo.geotrellis.png.package.saveStitched.
: ar.com.hjg.pngj.PngjInputException: Could not open for writes3:/OpenEO-data/batch_jobs/j-bfe5e7b6a8fe48dd9a5ee0cb0b1c1e51/out
        at ar.com.hjg.pngj.PngHelperInternal2.ostreamFromFile(PngHelperInternal2.java:31)
        at ar.com.hjg.pngj.PngHelperInternal.ostreamFromFile(PngHelperInternal.java:296)
        at ar.com.hjg.pngj.PngWriter.<init>(PngWriter.java:97)
        at ar.com.hjg.pngj.PngWriter.<init>(PngWriter.java:105)
        at org.openeo.geotrellis.png.package$.saveStitched(package.scala:39)
        at org.openeo.geotrellis.png.package.saveStitched(package.scala)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.FileNotFoundException: s3:/OpenEO-data/batch_jobs/j-bfe5e7b6a8fe48dd9a5ee0cb0b1c1e51/out (No such file or directory)
        at java.base/java.io.FileOutputStream.open0(Native Method)
        at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
        at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:237)
        at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:187)
        at ar.com.hjg.pngj.PngHelperInternal2.ostreamFromFile(PngHelperInternal2.java:29)
        ... 17 more
bossie commented 1 year ago

Possibly already implemented but not verfied yet: replay the process graph.

bossie commented 1 year ago

Commit has been rolled out on all environments but I'm still getting the same error:

...
Caused by: java.io.FileNotFoundException: s3:/OpenEO-data/batch_jobs/j-43f83402120e44af80e3420a4b9d8ad5/out (No such file or directory)
    at java.base/java.io.FileOutputStream.open0(Native Method)
    at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
    at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:237)
    at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:187)
    at ar.com.hjg.pngj.PngHelperInternal2.ostreamFromFile(PngHelperInternal2.java:29)
    ... 17 more
jdries commented 1 year ago

Mind the singly slash in the s3:/ prefix. This is again going to be the case where path gets broken. For Geotiff, I solved it by letting the writer return the final, corrected path were things ended up.

bossie commented 1 year ago

Writing the png file was fixed and it has been successfully uploaded to object storage (out):

$ s3cmd -c s3cfg_cdse ls s3://OpenEO-data/batch_jobs/j-952f478359604b2ea12ae605c442afe9/
2023-06-27 09:44         9146  s3://OpenEO-data/batch_jobs/j-952f478359604b2ea12ae605c442afe9/job_metadata.json
2023-06-27 09:44          483  s3://OpenEO-data/batch_jobs/j-952f478359604b2ea12ae605c442afe9/job_specification.json
2023-06-27 09:44     90447461  s3://OpenEO-data/batch_jobs/j-952f478359604b2ea12ae605c442afe9/out

However, job_metadata.json lists this asset as out.png instead:

"assets": {
  "out.png": {
    "href": "s3://OpenEO-data/batch_jobs/j-952f478359604b2ea12ae605c442afe9/out.png",
    "type": "image/png",
    "roles": [
      "data"
    ]
  }
}

The signed link in /jobs/j-952f478359604b2ea12ae605c442afe9/results therefore doesn't work:

{
    "code": "Internal",
    "id": "r-ba87fd1e25c84bc8a477936e69dea321",
    "message": "Server error: NoSuchKey('An error occurred (NoSuchKey) when calling the GetObject operation: Unknown')"
}
Traceback (most recent call last):
  File "/opt/openeo/lib/python3.8/site-packages/flask/app.py", line 1516, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/openeo/lib/python3.8/site-packages/flask/app.py", line 1502, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**req.view_args)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/views.py", line 1319, in download_job_result_signed
    return _download_job_result(job_id=job_id, filename=filename, user_id=user_id)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/views.py", line 1102, in _download_job_result
    return _stream_from_s3(result["href"], result)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/views.py", line 1123, in _stream_from_s3
    s3_file_object = s3_instance.get_object(Bucket=bucket, Key=folder)
  File "/opt/openeo/lib/python3.8/site-packages/botocore/client.py", line 357, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/opt/openeo/lib/python3.8/site-packages/botocore/client.py", line 676, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.NoSuchKey: An error occurred (NoSuchKey) when calling the GetObject operation: Unknown
bossie commented 1 year ago

TODO:

bossie commented 1 year ago

Fixed on CDSE dev. Comes with some nifty metadata too :+1: :

{
    "assets": {
        "out.png": {
            "file:nodata": [
                null
            ],
            "href": "https://openeo.dev.warsaw.openeo.dataspace.copernicus.eu/openeo/1.1/jobs/j-e9e0d36df5e548fb9e681729a466eb85/results/assets/.../out.png?expires=1688540567",
            "proj:bbox": [
                0.0,
                6029.0,
                5714.0,
                0.0
            ],
            "proj:epsg": 32633,
            "proj:shape": [
                5714,
                6029
            ],
            "raster:bands": [
                {
                    "name": "1",
                    "statistics": {
                        "maximum": 255.0,
                        "mean": 124.42407595583,
                        "minimum": 0.0,
                        "stddev": 74.395077242258,
                        "valid_percent": 100.0
                    }
                },
                {
                    "name": "2",
                    "statistics": {
                        "maximum": 255.0,
                        "mean": 125.3408646216,
                        "minimum": 0.0,
                        "stddev": 77.307545349428,
                        "valid_percent": 100.0
                    }
                },
                {
                    "name": "3",
                    "statistics": {
                        "maximum": 255.0,
                        "mean": 123.32699309538,
                        "minimum": 0.0,
                        "stddev": 76.219919822788,
                        "valid_percent": 100.0
                    }
                }
            ],
            "roles": [
                "data"
            ],
            "title": "out.png",
            "type": "image/png"
        }
    }
}
bossie commented 1 year ago

Still works on Terrascope as well. As a side-effect, this fix materialized the ghost out.png asset (it replaces what used to be the real asset (out)):

ghost_out png