Open-EO / openeo-geotrellis-extensions

Java/Scala extensions for Geotrellis, for use with OpenEO GeoPySpark backend.
Apache License 2.0
5 stars 3 forks source link

colormap error #138

Open jdries opened 1 year ago

jdries commented 1 year ago

We still have an error related to serialization of the geotiff colormap:

  from matplotlib.colors import ListedColormap
    col_palette = [
        [174, 199, 232, 255],  
        [214, 39, 40, 255],  
        [247, 182, 210, 255],  
        [219, 219, 141, 255],  
        [199, 199, 199, 255]

    ]
    cmap = ListedColormap(col_palette)
    classification_colors = {x: [c / 255.0 for c in cmap(x)] for x in range(0, len(col_palette))}
    return classification_colors

Driver stacktrace:
        at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2672)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2608)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2607)
        at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
        at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
        at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2607)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1182)
        at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1182)
        at scala.Option.foreach(Option.scala:407)
        at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1182)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2860)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2802)
        at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2791)
        at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
        at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:952)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2228)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2249)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2268)
        at org.apache.spark.SparkContext.runJob(SparkContext.scala:2293)
        at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1021)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:406)
        at org.apache.spark.rdd.RDD.collect(RDD.scala:1020)
        at org.openeo.geotrellis.geotiff.package$.saveRDDTemporal(package.scala:129)
        at org.openeo.geotrellis.geotiff.package.saveRDDTemporal(package.scala)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:566)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
        at py4j.Gateway.invoke(Gateway.java:282)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
        at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.ClassCastException: cannot assign instance of spire.std.DoubleAlgebra to field geotrellis.raster.render.BreakMap.evidence$1 of type org.locationtech.geopyspark.shaded.cats.kernel.Order in instance of geotrellis.raster.render.BreakMap$mcDI$sp
        at java.base/java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2205)
        at java.base/java.io.ObjectStreamClass$FieldReflector.checkObjectFieldValueTypes(ObjectStreamClass.java:2168)
        at java.base/java.io.ObjectStreamClass.checkObjFieldValueTypes(ObjectStreamClass.java:1422)
        at java.base/java.io.ObjectInputStream.defaultCheckFieldValues(ObjectInputStream.java:2506)
        at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2413)
        at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2222)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1681)
        at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2490)
        at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2384)
        at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2222)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1681)
        at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2490)
        at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2384)
        at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2222)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1681)
        at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2490)
        at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2384)
        at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2222)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1681)
        at java.base/java.io.ObjectInputStream.readArray(ObjectInputStream.java:2128)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
        at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2490)
        at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2384)
        at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2222)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1681)
        at java.base/java.io.ObjectInputStream.readArray(ObjectInputStream.java:2128)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
        at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2490)
        at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2384)
        at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2222)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1681)
        at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2490)
        at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2384)
        at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2222)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1681)
        at java.base/java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2490)
        at java.base/java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2384)
        at java.base/java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2222)
        at java.base/java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1681)
        at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:493)
        at java.base/java.io.ObjectInputStream.readObject(ObjectInputStream.java:451)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:87)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:129)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:85)
        at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
        at org.apache.spark.scheduler.Task.run(Task.scala:136)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        ... 1 more
EmileSonneveld commented 1 year ago

In the breakMap object there are some evidence parameters that cause an issue. They get created somewhere here: https://github.com/locationtech-labs/geopyspark/blob/master/geopyspark/geotrellis/color.py#L174-L176

image

Maybe this can be fixed by using the same kind of hack as here: https://github.com/Open-EO/openeo-geotrellis-extensions/issues/108

EmileSonneveld commented 1 year ago

This snippet will run correctly when launched from Scala, but will throw when launched from Python:

  def test(sc: SparkContext): Unit = {
    //    val sc = SparkContext.getOrCreate
    val mCopy = "0.0:aec7e8ff;1.0:d62728ff;2.0:f7b6d2ff;3.0:dbdb8dff;4.0:c7c7c7ff".split(";").map(x => {
      val l = x.split(":");
      // parseUnsignedInt, because there is no minus sign in the hexadecimal representation.
      // When casting an unsigned int to an int, it will correctly overflow
      Tuple2(l(0).toDouble, Integer.parseUnsignedInt(l(1), 16))
    }).toMap
    val colorMap = new DoubleColorMap(mCopy)
    val formatOptions = new GTiffOptions()
    formatOptions.setColorMap(colorMap)

    val spire = new DoubleAlgebra()

    val data = sc.parallelize(Seq(1, 2, 3))
    val result = data.map(x => {
      println(spire)
      println(formatOptions)
      x
    })
    result.collect()
    print("test done")
  }

I guess this is because of different objects/libraries available in the runtime environment, messing with the serialising process

To run from python:

        from openeogeotrellis.utils import get_jvm
        sc = get_jvm().org.apache.spark.SparkContext.getOrCreate()
        get_jvm().org.openeo.geotrellis.geotiff.package.test(sc)
jdries commented 1 year ago

Aha, so this type: type org.locationtech.geopyspark.shaded.cats.kernel.Order seems wrong, because when I look in the code, it should be of type spire.algebra.Order But we also see that some 'shading' has happened, where this 'cats.kernel.Order' type got relocated to a different package. It's quite likely that this causes some kind of mixup.

jdries commented 1 year ago

Note that nowadays, spark has a dependency on cats-kernel_2.12-2.1.1 Perhaps we can remove this shading rule from pom.xml and it might work?

cats.kernel org.locationtech.geopyspark.shaded.cats.kernel
EmileSonneveld commented 1 year ago

Commenting out the cats.kernel org.locationtech.geopyspark.shaded.cats.kernel part from pom.xml changes the namespace of Order:

Caused by: java.lang.ClassCastException: cannot assign instance of spire.std.DoubleAlgebra to field geotrellis.raster.render.BreakMap.evidence$1 of type spire.algebra.Order in instance of geotrellis.raster.render.BreakMap$mcDI$sp

jdries commented 1 year ago

I now removed geotrellis entirely from assembly jar: https://artifactory.vgt.vito.be/webapp/#/artifacts/browse/tree/General/auxdata-public/openeo/geotrellis-backend-assembly-0.4.8-openeo_2.12.jar Can you try if that works/helps?

EmileSonneveld commented 1 year ago

It still gives: ClassCastException: cannot assign instance of spire.std.DoubleAlgebra to field geotrellis.raster.render.BreakMap.evidence$1 of type org.locationtech.geopyspark.shaded.cats.kernel.Order in instance of geotrellis.raster.render.BreakMap$mcDI$sp I comitted the scala test function on develop, for easier debugging: https://github.com/Open-EO/openeo-geotrellis-extensions/commit/eac875794ce143cd13cbb5569730e43c4f612e79

jdries commented 11 months ago

need to retry running test from python, assembly jar got removed

EmileSonneveld commented 11 months ago

I tried running the snippet again from python, but got the same error. https://github.com/Open-EO/openeo-geotrellis-extensions/blob/develop/openeo-geotrellis/src/main/scala/org/openeo/geotrellis/geotiff/package.scala#L739

The scala code seems still to refer to geotrelis code in a jar: /home/emile/.m2/repository/org/locationtech/geotrellis/geotrellis-raster_2.12/3.6.0/geotrellis-raster_2.12-3.6.0-sources.jar!/geotrellis/raster/render/ColorMap.scala