VIDA-NYU / ache

ACHE is a web crawler for domain-specific search.
http://ache.readthedocs.io
Apache License 2.0
453 stars 135 forks source link

Unable to start crawler using Java 17 #322

Open aayushL opened 1 year ago

aayushL commented 1 year ago

Hello, I am trying to run ache crawler and dump data in my elasticsearch database. But I am getting this error

`[2023-01-16 10:52:19,001]ERROR [main] (Main.java:261) - Crawler execution failed: java.lang.reflect.InaccessibleObjectException: Unable to make field private final java.lang.Object[] java.util.Arrays$ArrayList.a accessible: module java.base does not "opens java.util" to unnamed module @6fa4fbe3

java.lang.RuntimeException: java.lang.reflect.InaccessibleObjectException: Unable to make field private final java.lang.Object[] java.util.Arrays$ArrayList.a accessible: module java.base does not "opens java.util" to unnamed module @6fa4fbe3 at de.javakaffee.kryoserializers.ArraysAsListSerializer.(ArraysAsListSerializer.java:47) at achecrawler.util.Kryos.registerDeserializers(Kryos.java:36) at achecrawler.util.Kryos.access$000(Kryos.java:24) at achecrawler.util.Kryos$1.initialValue(Kryos.java:29) at achecrawler.util.Kryos$1.initialValue(Kryos.java:26) at java.base/java.lang.ThreadLocal.setInitialValue(ThreadLocal.java:195) at java.base/java.lang.ThreadLocal.get(ThreadLocal.java:172) at achecrawler.util.Kryos.serializeObject(Kryos.java:58) at achecrawler.util.persistence.rocksdb.StringObjectHashtable.put(StringObjectHashtable.java:20) at achecrawler.util.persistence.PersistentHashtable.commit(PersistentHashtable.java:122) at achecrawler.link.frontier.Frontier.commit(Frontier.java:28) at achecrawler.link.frontier.FrontierManager.addSeeds(FrontierManager.java:254) at achecrawler.link.frontier.FrontierManagerFactory.create(FrontierManagerFactory.java:44) at achecrawler.link.LinkStorage.create(LinkStorage.java:169) at achecrawler.crawler.async.AsyncCrawler.create(AsyncCrawler.java:114) at achecrawler.crawler.CrawlersManager.createCrawler(CrawlersManager.java:104) at achecrawler.Main$StartCrawl.run(Main.java:247) at achecrawler.Main.main(Main.java:59) Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make field private final java.lang.Object[] java.util.Arrays$ArrayList.a accessible: module java.base does not "opens java.util" to unnamed module @6fa4fbe3 at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354) at java.base/java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297) at java.base/java.lang.reflect.Field.checkCanSetAccessible(Field.java:180) at java.base/java.lang.reflect.Field.setAccessible(Field.java:174) at de.javakaffee.kryoserializers.ArraysAsListSerializer.(ArraysAsListSerializer.java:45) ... 17 common frames omitted ` I am running following command: ache startCrawl -c ./config/ache.yml -s ./config/sample.seeds -o ../data/ -e http://localhost:9200

aecio commented 1 year ago

It looks like you are using a recent Java version and not exporting the proper modules required by some crawler dependencies.

aayushL commented 1 year ago

Thank you for your reply. For fixing this shall I downgrade my Java version? which version would be suitable? or Is there any other fix without downgrading Java?

aecio commented 1 year ago

I would expect it to work fine on Java 11 (the Docker version uses it). On recent Java versions, you can make modules accessible by adding flags --add-opens or --add-exports with the desired arguments to the java command. However, I'm not sure which values need to be passed as arguments to fix the specific problem you are reporting.

In ACHE, you can add these flags by setting the environment variable JAVA_OPTS before running the crawler. For example, in the Dockerfile we currently set JAVA_OPTS='-XX:+UseContainerSupport -XX:MaxRAMPercentage=80' to configure memory usage.

aayushL commented 1 year ago

Solved it by downgrading my java to java 11. Now its running perfectly fine.