Open-EO / openeo-geotrellis-extensions

Java/Scala extensions for Geotrellis, for use with OpenEO GeoPySpark backend.
Apache License 2.0
5 stars 4 forks source link

Issues with PNGs #24

Open jdries opened 3 years ago

jdries commented 3 years ago

Copied from: https://github.com/Open-EO/openeo-python-client/issues/219

  1. The files exported by a batch job are returned from the server in the headers with the wrong content type. I got application/octet-stream instead of image/png.
  2. The file doesn't have an extension, the header is content-disposition: inline; filename=out instead of content-disposition: inline; filename=out.png
  3. It seems I can't export a 3 band PNG. I have a (x,y,bands) data cube with the band labels were vv, vh, diff and got an error:
error processing batch job Traceback (most recent call last): File "batch_job.py", line 287, in main run_driver() File "batch_job.py", line 273, in run_driver api_version=api_version, job_dir=job_dir, dependencies=dependencies, user_id=user_id File "/data1/hadoop/yarn/local/usercache/openeo/appcache/application_1624287832965_16431/container_e4942_1624287832965_16431_01_000002/venv/lib64/python3.6/site-packages/openeogeotrellis/utils.py", line 37, in memory_logging_wrapper return function(*args, **kwargs) File "batch_job.py", line 347, in run_job assets_metadata = result.write_assets(str(output_file)) File "/data1/hadoop/yarn/local/usercache/openeo/appcache/application_1624287832965_16431/container_e4942_1624287832965_16431_01_000002/venv/lib64/python3.6/site-packages/openeo_driver/save_result.py", line 79, in write_assets return self.cube.write_assets(filename=directory, format=self.format, format_options=self.options) File "/data1/hadoop/yarn/local/usercache/openeo/appcache/application_1624287832965_16431/container_e4942_1624287832965_16431_01_000002/venv/lib64/python3.6/site-packages/openeogeotrellis/geopysparkdatacube.py", line 1209, in write_assets self._get_jvm().org.openeo.geotrellis.png.package.saveStitched(spatial_rdd.srdd.rdd(), filename, crop_extent) File "/usr/hdp/3.1.4.0-315/spark2/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__ answer, self.gateway_client, self.target_id, self.name) File "/usr/hdp/3.1.4.0-315/spark2/python/lib/py4j-0.10.7-src.zip/py4j/protocol.py", line 328, in get_return_value format(target_id, ".", name), value) py4j.protocol.Py4JJavaError: An error occurred while calling z:org.openeo.geotrellis.png.package.saveStitched. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 18.0 failed 4 times, most recent failure: Lost task 0.3 in stage 18.0 (TID 50, epod057.vgt.vito.be, executor 4): java.lang.ClassCastException: java.lang.Integer cannot be cast to geotrellis.layer.SpatialKey at geotrellis.spark.partition.PartitionerIndex$SpatialPartitioner$.toIndex(PartitionerIndex.scala:37) at geotrellis.spark.partition.SpacePartitioner.getPartition(SpacePartitioner.scala:45) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1651) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1639) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1638) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1638) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:831) at scala.Option.foreach(Option.scala:257) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:831) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1872) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1821) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1810) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:642) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2039) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2060) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2079) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2104) at org.apache.spark.rdd.RDD$$anonfun$collect$1.apply(RDD.scala:945) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.RDD.collect(RDD.scala:944) at org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:309) at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:171) at org.apache.spark.RangePartitioner.<init>(Partitioner.scala:151) at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:62) at org.apache.spark.rdd.OrderedRDDFunctions$$anonfun$sortByKey$1.apply(OrderedRDDFunctions.scala:61) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:363) at org.apache.spark.rdd.OrderedRDDFunctions.sortByKey(OrderedRDDFunctions.scala:61) at org.openeo.geotrellis.png.package$.saveStitched(package.scala:21) at org.openeo.geotrellis.png.package.saveStitched(package.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at py4j.Gateway.invoke(Gateway.java:282) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:238) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassCastException: java.lang.Integer cannot be cast to geotrellis.layer.SpatialKey at geotrellis.spark.partition.PartitionerIndex$SpatialPartitioner$.toIndex(PartitionerIndex.scala:37) at geotrellis.spark.partition.SpacePartitioner.getPartition(SpacePartitioner.scala:45) at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) at org.apache.spark.scheduler.Task.run(Task.scala:109) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more
jdries commented 3 years ago

There seems to be some RGB support, so rather a bug: https://github.com/Open-EO/openeo-geotrellis-extensions/blob/master/openeo-geotrellis/src/main/scala/org/openeo/geotrellis/png/package.scala#L65

@m-mohr any idea on priority, or just reporting this?

m-mohr commented 3 years ago

Just reporting, I mainly use PNG for demos in the Web Editor until we either get the Map Viewer from Sinergise or web service support on back-end side.

jdries commented 2 years ago

Note that number 1 and 2 seem to be fixed: https://github.com/Open-EO/openeo-geopyspark-driver/blob/633e8ff10fecc32b267f98624e3bce03468e6e4a/openeogeotrellis/geopysparkdatacube.py#L1552