Open-EO / openeo-geotrellis-extensions

Java/Scala extensions for Geotrellis, for use with OpenEO GeoPySpark backend.
Apache License 2.0
5 stars 3 forks source link

Creo: fix stac item metadata writing to S3 #148

Closed jdries closed 1 year ago

jdries commented 1 year ago

From logs on creo:

{"name":"org.openeo.geotrellis.stac.STACItem","levelname":"WARNING","message":"Failed to write STAC metadata.","created":1682316527.078144000,"filename":"STACItem.scala","lineno":61,"exc_info":"java.nio.file.NoSuchFileException: s3:/OpenEO-data/batch_jobs/j-75a638311012466b9b9dab491c651716/openEO_item.json\n\tat java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)\n\tat java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)\n\tat java.base/sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:219)\n\tat java.base/java.nio.file.spi.FileSystemProvider.newOutputStream(FileSystemProvider.java:478)\n\tat java.base/java.nio.file.Files.newOutputStream(Files.java:220)\n\tat java.base/java.nio.file.Files.write(Files.java:3425)\n\tat org.openeo.geotrellis.stac.STACItem.write(STACItem.scala:59)\n\tat org.openeo.geotrellis.geotiff.package$.saveRDDGeneric(package.scala:266)\n\tat org.openeo.geotrellis.geotiff.package$.saveRDD(package.scala:145)\n\tat org.openeo.geotrellis.geotiff.package.saveRDD(package.scala)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:566)\n\tat py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)\n\tat py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)\n\tat py4j.Gateway.invoke(Gateway.java:282)\n\tat py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)\n\tat py4j.commands.CallCommand.execute(CallCommand.java:79)\n\tat py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)\n\tat py4j.ClientServerConnection.run(ClientServerConnection.java:106)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\n","job_id":"j-75a638311012466b9b9dab491c651716","user_id":"dfa678cb9ab17f65d4f025e30fac5e0d90116176e44fd17d703419322747cbbd@egi.eu"}
{"name":"org.openeo.geotrellis.geotiff.package$","levelname":"INFO","message":"Writing geotiff to s3://OpenEO-data/batch_jobs/j-75a638311012466b9b9dab491c651716/openEO.tif with type uint16ud65535 and bands 2.0","created":1682316527.078992000,"filename":"package.scala","lineno":425,"job_id":"j-75a638311012466b9b9dab491c651716","user_id":"dfa678cb9ab17f65d4f025e30fac5e0d90116176e44fd17d703419322747cbbd@egi.eu"}
EmileSonneveld commented 1 year ago

@jdries Is it easy to get the process graph from this job? j-75a638311012466b9b9dab491c651716 Or is there an other way to reproduce this?

jdries commented 1 year ago

I basically think any job on creo writing multiple geotiffs would show the problem, it's not process graph specific.

This is the line that now only works for posix, while it gets an S3 url: https://github.com/Open-EO/openeo-geotrellis-extensions/blob/a8f90437b3393bcb245d6fbad2b41235df997ff8/openeo-geotrellis/src/main/scala/org/openeo/geotrellis/stac/STACItem.scala#L59

Example code to write something to S3: https://github.com/Open-EO/openeo-geotrellis-extensions/blob/a8f90437b3393bcb245d6fbad2b41235df997ff8/openeo-geotrellis/src/main/scala/org/openeo/geotrellis/geotiff/package.scala#L673

EmileSonneveld commented 1 year ago

This error got logged by j-035de743162a4147845ed7c850c33575 After a deploy, I launched the same process graph and got no error: j-35c175664afc4d708b4f20923a671179 However, I could not acces the S3 to see if the metadata was indeed present.

EmileSonneveld commented 1 year ago

The openEO_item.json file is indeed available in the second job:

emile@Dell-Precision:~/$ s3cmd [...].cloudferro.com ls s3://OpenEO-data/batch_jobs/j-035de743162a4147845ed7c850c33575/
2023-09-18 05:04        88748  s3://OpenEO-data/batch_jobs/j-035de743162a4147845ed7c850c33575/job_metadata.json
2023-09-18 05:04        31026  s3://OpenEO-data/batch_jobs/j-035de743162a4147845ed7c850c33575/job_specification.json
2023-09-18 05:04         3921  s3://OpenEO-data/batch_jobs/j-035de743162a4147845ed7c850c33575/openEO.tif
2023-09-18 05:04         6435  s3://OpenEO-data/batch_jobs/j-035de743162a4147845ed7c850c33575/openEO.tif.aux.xml

emile@Dell-Precision:~/$ s3cmd [...].cloudferro.com ls s3://OpenEO-data/batch_jobs/j-35c175664afc4d708b4f20923a671179/
2023-09-18 19:00        88706  s3://OpenEO-data/batch_jobs/j-35c175664afc4d708b4f20923a671179/job_metadata.json
2023-09-18 19:00        30977  s3://OpenEO-data/batch_jobs/j-35c175664afc4d708b4f20923a671179/job_specification.json
2023-09-18 19:00         3921  s3://OpenEO-data/batch_jobs/j-35c175664afc4d708b4f20923a671179/openEO.tif
2023-09-18 19:00         6435  s3://OpenEO-data/batch_jobs/j-35c175664afc4d708b4f20923a671179/openEO.tif.aux.xml
2023-09-18 19:00          749  s3://OpenEO-data/batch_jobs/j-35c175664afc4d708b4f20923a671179/openEO_item.json