apache / incubator-stormcrawler

A scalable, mature and versatile web crawler based on Apache Storm
https://stormcrawler.apache.org/
Apache License 2.0
887 stars 262 forks source link

2.x Jackson version conflict when running in local mode? #880

Closed jnioche closed 3 years ago

jnioche commented 3 years ago

Does not happen in remote mode, only when running in local

14:32:01.031 [Thread-49-tika-executor[24, 24]] ERROR o.a.s.u.Utils - Async loop died!
java.lang.VerifyError: Stack map does not match the one at exception handler 77
Exception Details:
  Location:
    com/fasterxml/jackson/databind/deser/std/StdDeserializer._parseDate(Lcom/fasterxml/jackson/core/JsonParser;Lcom/fasterxml/jackson/databind/DeserializationContext;)Ljava/util/Date; @77: astore
  Reason:
    Type 'com/fasterxml/jackson/core/JsonParseException' (current frame, stack[0]) is not assignable to 'com/fasterxml/jackson/core/exc/StreamReadException' (stack map, stack[0])
  Current Frame:
    bci: @69
    flags: { }
    locals: { 'com/fasterxml/jackson/databind/deser/std/StdDeserializer', 'com/fasterxml/jackson/core/JsonParser', 'com/fasterxml/jackson/databind/DeserializationContext' }
    stack: { 'com/fasterxml/jackson/core/JsonParseException' }
  Stackmap Frame:
    bci: @77
    flags: { }
    locals: { 'com/fasterxml/jackson/databind/deser/std/StdDeserializer', 'com/fasterxml/jackson/core/JsonParser', 'com/fasterxml/jackson/databind/DeserializationContext' }
    stack: { 'com/fasterxml/jackson/core/exc/StreamReadException' }
  Bytecode:
    0x0000000: 2bb6 0035 aa00 0000 0000 0081 0000 0003
    0x0000010: 0000 000b 0000 007a 0000 0081 0000 0081
    0x0000020: 0000 0034 0000 0041 0000 0081 0000 0081
    0x0000030: 0000 0081 0000 0071 2a2b b600 11b6 0012
    0x0000040: 2cb6 006b b02b b600 4742 a700 223a 052c
    0x0000050: 2ab4 0002 2bb6 006e 126f 03bd 0004 b600
    0x0000060: 70c0 002d 3a06 1906 b600 4c42 bb00 7159
    0x0000070: 21b7 0072 b02a 2cb6 0073 c000 71b0 2a2b
    0x0000080: 2cb6 0074 b02c 2ab4 0002 2bb6 0025 c000
    0x0000090: 71b0                                   
  Exception Handler Table:
    bci [69, 74] => handler: 77
    bci [69, 74] => handler: 77
  Stackmap Table:
    same_frame(@56)
    same_frame(@69)
    same_locals_1_stack_item_frame(@77,Object[#359])
    append_frame(@108,Long)
    chop_frame(@117,1)
    same_frame(@126)
    same_frame(@133)

    at com.fasterxml.jackson.databind.deser.BasicDeserializerFactory.createTreeDeserializer(BasicDeserializerFactory.java:1513) ~[crawl2-1.0-SNAPSHOT.jar:?]
    at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer2(DeserializerCache.java:409) ~[crawl2-1.0-SNAPSHOT.jar:?]
    at com.fasterxml.jackson.databind.deser.DeserializerCache._createDeserializer(DeserializerCache.java:349) ~[crawl2-1.0-SNAPSHOT.jar:?]
    at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCache2(DeserializerCache.java:264) ~[crawl2-1.0-SNAPSHOT.jar:?]
    at com.fasterxml.jackson.databind.deser.DeserializerCache._createAndCacheValueDeserializer(DeserializerCache.java:244) ~[crawl2-1.0-SNAPSHOT.jar:?]
    at com.fasterxml.jackson.databind.deser.DeserializerCache.findValueDeserializer(DeserializerCache.java:142) ~[crawl2-1.0-SNAPSHOT.jar:?]
    at com.fasterxml.jackson.databind.DeserializationContext.findRootValueDeserializer(DeserializationContext.java:476) ~[crawl2-1.0-SNAPSHOT.jar:?]
    at com.fasterxml.jackson.databind.ObjectMapper._findRootDeserializer(ObjectMapper.java:4389) ~[crawl2-1.0-SNAPSHOT.jar:?]
    at com.fasterxml.jackson.databind.ObjectMapper._readMapAndClose(ObjectMapper.java:4198) ~[crawl2-1.0-SNAPSHOT.jar:?]
    at com.fasterxml.jackson.databind.ObjectMapper.readValue(ObjectMapper.java:3242) ~[crawl2-1.0-SNAPSHOT.jar:?]
    at com.digitalpebble.stormcrawler.filtering.URLFilters.loadJSONResources(URLFilters.java:99) ~[crawl2-1.0-SNAPSHOT.jar:?]
    at com.digitalpebble.stormcrawler.JSONResource.loadJSONResources(JSONResource.java:52) ~[crawl2-1.0-SNAPSHOT.jar:?]
    at com.digitalpebble.stormcrawler.filtering.URLFilters.<init>(URLFilters.java:89) ~[crawl2-1.0-SNAPSHOT.jar:?]
    at com.digitalpebble.stormcrawler.filtering.URLFilters.fromConf(URLFilters.java:68) ~[crawl2-1.0-SNAPSHOT.jar:?]
    at com.digitalpebble.stormcrawler.tika.ParserBolt.prepare(ParserBolt.java:109) ~[crawl2-1.0-SNAPSHOT.jar:?]
    at org.apache.storm.executor.bolt.BoltExecutor.init(BoltExecutor.java:147) ~[storm-client-2.2.0.jar:2.2.0]
    at org.apache.storm.executor.bolt.BoltExecutor.call(BoltExecutor.java:157) ~[storm-client-2.2.0.jar:2.2.0]
    at org.apache.storm.executor.bolt.BoltExecutor.call(BoltExecutor.java:59) ~[storm-client-2.2.0.jar:2.2.0]
    at org.apache.storm.utils.Utils$1.run(Utils.java:389) [storm-client-2.2.0.jar:2.2.0]
    at java.lang.Thread.run(Thread.java:748) [?:1.8.0_282]
14:32:01.031 [Thread-48-filter-executor[6, 6]] ERROR o.a.s.u.Utils - Async loop died!
jnioche commented 3 years ago

One of the differences between remote and local in Storm 2 resides in the classpath

REMOTE /usr/share/apache-storm-2.2.0/*:/usr/share/apache-storm-2.2.0/lib-worker/*:/usr/share/apache-storm-2.2.0/extlib/*:target/crawl2-1.0-SNAPSHOT.jar:/usr/share/apache-storm-2.2.0/conf:/usr/share/apache-storm-2.2.0/bin:

LOCAL /usr/share/apache-storm-2.2.0/*:/usr/share/apache-storm-2.2.0/lib/*:/usr/share/apache-storm-2.2.0/extlib/*:target/crawl2-1.0-SNAPSHOT.jar:/usr/share/apache-storm-2.2.0/conf:/usr/share/apache-storm-2.2.0/bin:

lib-worker is used in remote and doesn't contain much whereas lib contains Jackson 2.9.8 which conflicts with 2.11.1 specified by our core module.

One option would be to downgrade to the same version as Storm - but we'll get all manners of alerts about it being unsafe. Apparently, the next version of Storm will use 2.10.

We could use a different library in the core module altogether, but some of our other modules (ES, Tika) declare a dependency on it as well.

jnioche commented 3 years ago

Adding

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>com.fasterxml.jackson.core</groupId>
                <artifactId>jackson-databind</artifactId>
                <version>2.9.8</version>
            </dependency>
        </dependencies>
    </dependencyManagement>

to the pom.xml of the topology made it work fine in local mode. Might add it commented out to the pom generated by the archetypes.

jnioche commented 3 years ago

Fixed by #911