FasterXML / jackson-module-scala

Add-on module for Jackson (https://github.com/FasterXML/jackson) to support Scala-specific datatypes
Apache License 2.0
502 stars 142 forks source link

jackson-module-scala_2.12 reports java.lang.ArrayIndexOutOfBoundsException #565

Closed PingYufeng closed 2 years ago

PingYufeng commented 2 years ago

spark 3.2.0 scala 2.12

        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.12</artifactId>
            <version>3.2.0</version>
        </dependency>

when I only import spark-core and run my codes on IDEA of win10, it is ok. However, I package my codes and submit it on spark of linux to run, it reports java.lang.ArrayIndexOutOfBoundsException.

I found some issues like that import paranamer-2.8.jar, it also could not work. So I try to change 2.12.0-rc2 version of jackson-module-scala_2.12, it finally runs very well.

            <dependency>
                <groupId>com.fasterxml.jackson.module</groupId>
                <artifactId>jackson-module-scala_2.12</artifactId>
                <version>2.12.0-rc2</version>
            </dependency>

Why?

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 34826
    at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:563)
    at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:338)
    at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:103)
    at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:79)
    at com.fasterxml.jackson.module.scala.introspect.JavaParameterIntrospector$.getCtorParamNames(JavaParameterIntrospector.scala:12)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.getCtorParams(BeanIntrospector.scala:41)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$2(BeanIntrospector.scala:61)
    at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
    at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.findConstructorParam$1(BeanIntrospector.scala:61)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$23(BeanIntrospector.scala:203)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
    at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
    at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
    at scala.collection.TraversableLike.map(TraversableLike.scala:286)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
    at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$18(BeanIntrospector.scala:197)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$18$adapted(BeanIntrospector.scala:194)
    at scala.collection.immutable.List.flatMap(List.scala:366)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.apply(BeanIntrospector.scala:194)
    at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$._descriptorFor(ScalaAnnotationIntrospectorModule.scala:154)
    at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.fieldName(ScalaAnnotationIntrospectorModule.scala:165)
    at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.findImplicitPropertyName(ScalaAnnotationIntrospectorModule.scala:46)
    at com.fasterxml.jackson.databind.introspect.AnnotationIntrospectorPair.findImplicitPropertyName(AnnotationIntrospectorPair.java:496)
    at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector._addFields(POJOPropertiesCollector.java:530)
    at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.collectAll(POJOPropertiesCollector.java:421)
    at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.getJsonValueAccessor(POJOPropertiesCollector.java:270)
    at com.fasterxml.jackson.databind.introspect.BasicBeanDescription.findJsonValueAccessor(BasicBeanDescription.java:258)
    at com.fasterxml.jackson.databind.ser.BasicSerializerFactory.findSerializerByAnnotations(BasicSerializerFactory.java:391)
    at com.fasterxml.jackson.databind.ser.BeanSerializerFactory._createSerializer2(BeanSerializerFactory.java:220)
    at com.fasterxml.jackson.databind.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:169)
    at com.fasterxml.jackson.databind.SerializerProvider._createUntypedSerializer(SerializerProvider.java:1473)
    at com.fasterxml.jackson.databind.SerializerProvider._createAndCacheUntypedSerializer(SerializerProvider.java:1421)
    at com.fasterxml.jackson.databind.SerializerProvider.findValueSerializer(SerializerProvider.java:520)
    at com.fasterxml.jackson.databind.SerializerProvider.findTypedValueSerializer(SerializerProvider.java:798)
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:308)
    at com.fasterxml.jackson.databind.ObjectMapper._writeValueAndClose(ObjectMapper.java:4487)
    at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:3742)
    at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:52)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:145)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.SparkContext.withScope(SparkContext.scala:792)
    at org.apache.spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:1281)
    at org.locationtech.geomesa.spark.hbase.HBaseSpatialRDDProvider.queryPlanToRdd$1(HBaseSpatialRDDProvider.scala:65)
    at org.locationtech.geomesa.spark.hbase.HBaseSpatialRDDProvider.$anonfun$rdd$3(HBaseSpatialRDDProvider.scala:73)
    at scala.collection.immutable.List.map(List.scala:293)
    at org.locationtech.geomesa.spark.hbase.HBaseSpatialRDDProvider.rdd(HBaseSpatialRDDProvider.scala:73)
....
pjfanning commented 2 years ago

The error appears in paranamer - not in jackson code. This is an open source project - a collaboration between developers - not a free support service.

pjfanning commented 2 years ago

This could be related to https://github.com/FasterXML/jackson-module-scala/issues/505 which brought back the paranamer dependency that have been removed earlier in the 2.12 timeframe. Could you provide the class(es) that you are serliazing/deserializing so they can be tested with/without paranamer?

PingYufeng commented 2 years ago

This could be related to #505 which brought back the paranamer dependency that have been removed earlier in the 2.12 timeframe. Could you provide the class(es) that you are serliazing/deserializing so they can be tested with/without paranamer?

Maybe you must shuold run the codes with scala-2.12, spark-3.2.0, hadoop-3.2.1, hbase-2.3.7, geomesa-3.2.2.


    <properties>
        <scala.abi.version>2.12</scala.abi.version>
        <scala.version>2.12.15</scala.version>
        <spark.version>3.2.0</spark.version>
        <geomesa.version>3.2.2</geomesa.version>
        <gt.version>23.3</gt.version>
        <zookeeper.version>3.5.7</zookeeper.version>
        <hbase.version>2.3.7</hbase.version>
        <hadoop.version>3.2.1</hadoop.version>
    </properties>

<dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.scala-lang</groupId>
                <artifactId>scala-library</artifactId>
                <version>${scala.version}</version>
            </dependency>
            <dependency>
                <groupId>com.fasterxml.jackson.module</groupId>
                <artifactId>jackson-module-scala_2.12</artifactId>
                <version>2.12.0-rc2</version>
            </dependency>
        </dependencies>
    </dependencyManagement>
    <dependencies>
        <dependency>
            <groupId>org.locationtech.geomesa</groupId>
            <artifactId>geomesa-spark-core_${scala.abi.version}</artifactId>
            <version>${geomesa.version}</version>
        </dependency>
        <dependency>
            <groupId>org.locationtech.geomesa</groupId>
            <artifactId>geomesa-spark-jts_${scala.abi.version}</artifactId>
            <version>${geomesa.version}</version>
        </dependency>
        <dependency>
            <groupId>org.locationtech.geomesa</groupId>
            <artifactId>geomesa-spark-sql_${scala.abi.version}</artifactId>
            <version>${geomesa.version}</version>
        </dependency>
        <dependency>
            <groupId>org.locationtech.geomesa</groupId>
            <artifactId>geomesa-hbase-datastore_${scala.abi.version}</artifactId>
            <version>${geomesa.version}</version>
        </dependency>
        <dependency>
            <groupId>org.locationtech.geomesa</groupId>
            <artifactId>geomesa-hbase-spark-runtime-hbase2_${scala.abi.version}</artifactId>
            <version>${geomesa.version}</version>
        </dependency>
        <dependency>
            <groupId>org.geotools</groupId>
            <artifactId>gt-epsg-wkt</artifactId>
            <version>${gt.version}</version>
        </dependency>
        <dependency>
            <groupId>org.geotools</groupId>
            <artifactId>gt-opengis</artifactId>
            <version>${gt.version}</version>
        </dependency>
        <dependency>
            <groupId>org.geotools</groupId>
            <artifactId>gt-main</artifactId>
            <version>${gt.version}</version>
        </dependency>
        <dependency>
            <groupId>org.geotools</groupId>
            <artifactId>gt-epsg-hsql</artifactId>
            <version>${gt.version}</version>
        </dependency>
        <dependency>
            <groupId>org.geotools</groupId>
            <artifactId>gt-wfs-ng</artifactId>
            <version>${gt.version}</version>
        </dependency>

        <!-- hadoop hbase -->
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-client</artifactId>
            <version>${hbase.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-common</artifactId>
            <version>${hbase.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-server</artifactId>
            <version>${hbase.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-annotations</artifactId>
            <version>${hbase.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-protocol</artifactId>
            <version>${hbase.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hbase</groupId>
            <artifactId>hbase-mapreduce</artifactId>
            <version>${hbase.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.zookeeper</groupId>
            <artifactId>zookeeper</artifactId>
            <version>${zookeeper.version}</version>
        </dependency>
        <dependency>
            <groupId>com.google.guava</groupId>
            <artifactId>guava</artifactId>
            <version>30.1-jre</version>
        </dependency>
    </dependencies>

import java.text.SimpleDateFormat
import org.apache.hadoop.conf.Configuration
import org.apache.log4j.Logger
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext, TaskContext}
import org.geotools.data.{DataStoreFinder, Query}
import org.geotools.factory.CommonFactoryFinder
import org.geotools.filter.text.ecql.ECQL
import org.locationtech.geomesa.hbase.data.HBaseDataStore
import org.locationtech.geomesa.spark.GeoMesaSpark
import org.opengis.feature.simple.SimpleFeature

import scala.collection.JavaConversions._

object SparkHbaseTest {

  private val logger:Logger = Logger.getLogger(SparkHbaseTest.getClass)

  val params = Map("hbase.zookeepers" -> "zoo", "hbase.catalog" -> "gis_osm_roads_free_1")

  // see geomesa-tools/conf/sfts/gdelt/reference.conf
  val typeName = "gis_osm_roads_free_1"

  def main(args: Array[String]) {

    // Get a handle to the data store
    val ds = DataStoreFinder.getDataStore(params).asInstanceOf[HBaseDataStore]

    // Construct a CQL query to filter by bounding box
    //    val query = new Query(typeName, ECQL.toFilter(filter))
    val query = new Query(typeName)

    // Get the appropriate spatial RDD provider
    val spatialRDDProvider = GeoMesaSpark(params)

    // Configure Spark
    val conf = new SparkConf()
    val sc = SparkContext.getOrCreate(conf)

    // Get an RDD[SimpleFeature] from the spatial RDD provider
    val rdd = spatialRDDProvider.rdd(new Configuration, sc, params, query)
    rdd.foreachPartition(partition => {
      partition.foreach(item => {
        println(item)
      })
    })
    ds.dispose()
  }
}
pjfanning commented 2 years ago

This does not look like an easily reproducible test case. I'm going to ignore it.

PingYufeng commented 2 years ago

This does not look like an easily reproducible test case. I'm going to ignore it.

Oh no, I was dismayed to hear it. I do not know the reason of the Exception. Yes, it is hard to set up the base runtime environment to run the code. I think you can do it, come on! I will give you five!

pjfanning commented 2 years ago

I did add a change to 2.14.0-SNAPSHOT a few days ago if you want to try that version.

There are a lot of Spark users who rely on jackson-module-scala indirectly and I haven't seen any other reports of jackson-module-scala issues. If more users hit issues, maybe they'll be more willing to spend time producing a more easily reproduced issue.

pjfanning commented 2 years ago

https://www.geomesa.org/documentation/stable/user/spark/index.html -- geomesa don't officially support spark 3.2

PingYufeng commented 2 years ago

https://www.geomesa.org/documentation/stable/user/spark/index.html -- geomesa don't officially support spark 3.2

Oh, I have not noticed that geomesa does not support spark 3.2.x, I will change spark 3.1.2 to test my code. Thanks your tip. Maybe you could also change spark 3.1.2 to test if you still are interested in it.

PingYufeng commented 2 years ago

https://www.geomesa.org/documentation/stable/user/spark/index.html -- geomesa don't officially support spark 3.2

Oh, I have not noticed that geomesa does not support spark 3.2.x, I will change spark 3.1.2 to test my code. Thanks your tip. Maybe you could also change spark 3.1.2 to test if you still are interested in it.

when I used spark 3.1.2,it also reports java.lang.ArrayIndexOutOfBoundsException.

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 28499
    at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.accept(BytecodeReadingParanamer.java:563)
    at com.thoughtworks.paranamer.BytecodeReadingParanamer$ClassReader.access$200(BytecodeReadingParanamer.java:338)
    at com.thoughtworks.paranamer.BytecodeReadingParanamer.lookupParameterNames(BytecodeReadingParanamer.java:103)
    at com.thoughtworks.paranamer.CachingParanamer.lookupParameterNames(CachingParanamer.java:79)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.getCtorParams(BeanIntrospector.scala:44)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1(BeanIntrospector.scala:58)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$1$adapted(BeanIntrospector.scala:58)
    at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
    at scala.collection.Iterator.foreach(Iterator.scala:943)
    at scala.collection.Iterator.foreach$(Iterator.scala:943)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
    at scala.collection.IterableLike.foreach(IterableLike.scala:74)
    at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
    at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
    at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
    at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.findConstructorParam$1(BeanIntrospector.scala:58)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$19(BeanIntrospector.scala:176)
    at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
    at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
    at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
    at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
    at scala.collection.TraversableLike.map(TraversableLike.scala:286)
    at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
    at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14(BeanIntrospector.scala:170)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.$anonfun$apply$14$adapted(BeanIntrospector.scala:169)
    at scala.collection.immutable.List.flatMap(List.scala:366)
    at com.fasterxml.jackson.module.scala.introspect.BeanIntrospector$.apply(BeanIntrospector.scala:169)
    at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$._descriptorFor(ScalaAnnotationIntrospectorModule.scala:21)
    at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.fieldName(ScalaAnnotationIntrospectorModule.scala:29)
    at com.fasterxml.jackson.module.scala.introspect.ScalaAnnotationIntrospector$.findImplicitPropertyName(ScalaAnnotationIntrospectorModule.scala:77)
    at com.fasterxml.jackson.databind.introspect.AnnotationIntrospectorPair.findImplicitPropertyName(AnnotationIntrospectorPair.java:490)
    at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector._addFields(POJOPropertiesCollector.java:380)
    at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.collectAll(POJOPropertiesCollector.java:308)
    at com.fasterxml.jackson.databind.introspect.POJOPropertiesCollector.getJsonValueAccessor(POJOPropertiesCollector.java:196)
    at com.fasterxml.jackson.databind.introspect.BasicBeanDescription.findJsonValueAccessor(BasicBeanDescription.java:252)
    at com.fasterxml.jackson.databind.ser.BasicSerializerFactory.findSerializerByAnnotations(BasicSerializerFactory.java:346)
    at com.fasterxml.jackson.databind.ser.BeanSerializerFactory._createSerializer2(BeanSerializerFactory.java:216)
    at com.fasterxml.jackson.databind.ser.BeanSerializerFactory.createSerializer(BeanSerializerFactory.java:165)
    at com.fasterxml.jackson.databind.SerializerProvider._createUntypedSerializer(SerializerProvider.java:1388)
    at com.fasterxml.jackson.databind.SerializerProvider._createAndCacheUntypedSerializer(SerializerProvider.java:1336)
    at com.fasterxml.jackson.databind.SerializerProvider.findValueSerializer(SerializerProvider.java:510)
    at com.fasterxml.jackson.databind.SerializerProvider.findTypedValueSerializer(SerializerProvider.java:713)
    at com.fasterxml.jackson.databind.ser.DefaultSerializerProvider.serializeValue(DefaultSerializerProvider.java:308)
    at com.fasterxml.jackson.databind.ObjectMapper._configAndWriteValue(ObjectMapper.java:4094)
    at com.fasterxml.jackson.databind.ObjectMapper.writeValueAsString(ObjectMapper.java:3404)
    at org.apache.spark.rdd.RDDOperationScope.toJson(RDDOperationScope.scala:52)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:145)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
    at org.apache.spark.SparkContext.withScope(SparkContext.scala:786)
    at org.apache.spark.SparkContext.newAPIHadoopRDD(SparkContext.scala:1275)
    at org.locationtech.geomesa.spark.hbase.HBaseSpatialRDDProvider.queryPlanToRdd$1(HBaseSpatialRDDProvider.scala:65)
    at org.locationtech.geomesa.spark.hbase.HBaseSpatialRDDProvider.$anonfun$rdd$3(HBaseSpatialRDDProvider.scala:73)
    at scala.collection.immutable.List.map(List.scala:293)
    at org.locationtech.geomesa.spark.hbase.HBaseSpatialRDDProvider.rdd(HBaseSpatialRDDProvider.scala:73)
cowtowncoder commented 2 years ago

What is needed here is not more stack traces from Spark usage, but a self-contained test case, ideally without Spark at all. But if not practical, maven/groovy project with unit test(s) triggering the behavior, to give stack trace. That may turn out then to be an integration issue.

However: stack trace and reference to Paranamer suggests that that (old, legacy) library is one having issues with Spark(?)-generated classes. If so it could well be Paranamer bug. If so, one possibility would be to check if version used is the latest (2.8) or not:

https://mvnrepository.com/artifact/com.thoughtworks.paranamer/paranamer

and if not, perhaps upgrade of that dependency would help.

pjfanning commented 2 years ago

Yes - I forgot to mention that some similar issues were reported when people used paranamer versions less than 2.8.

pjfanning commented 2 years ago

Closing, as answered