archivesunleashed / aut

The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
https://aut.docs.archivesunleashed.org/
Apache License 2.0
137 stars 33 forks source link

Packages build is often broken - should we support it? #483

Closed ruebot closed 4 years ago

ruebot commented 4 years ago

Since we've moved to a newer version of Tika, we've had on-and-off trouble (mostly on!) with aut working with --packages. The current state yields this:

        ::::::::::::::::::::::::::::::::::::::::::::::

        ::              FAILED DOWNLOADS            ::

        :: ^ see resolution messages for details  ^ ::

        ::::::::::::::::::::::::::::::::::::::::::::::

        :: javax.activation#activation;1.1!activation.jar

        :: com.google.guava#guava;28.0-jre!guava.jar(bundle)

        :: com.google.guava#failureaccess;1.0.1!failureaccess.jar(bundle)

        :: com.google.guava#listenablefuture;9999.0-empty-to-avoid-conflict-with-guava!listenablefuture.jar

        :: com.google.code.findbugs#jsr305;3.0.2!jsr305.jar

        :: org.checkerframework#checker-qual;2.8.1!checker-qual.jar

        :: com.google.errorprone#error_prone_annotations;2.3.2!error_prone_annotations.jar

        :: com.google.j2objc#j2objc-annotations;1.3!j2objc-annotations.jar

        :: org.codehaus.mojo#animal-sniffer-annotations;1.17!animal-sniffer-annotations.jar

        :: com.google.protobuf#protobuf-java;3.9.0!protobuf-java.jar(bundle)

        :: joda-time#joda-time;2.10.6!joda-time.jar

        ::::::::::::::::::::::::::::::::::::::::::::::

:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [download failed: javax.activation#activation;1.1!activation.jar, download failed: com.google.guava#guava;28.0-jre!guava.jar(bundle), download failed: com.google.guava#failureaccess;1.0.1!failureaccess.jar(bundle), download failed: com.google.guava#listenablefuture;9999.0-empty-to-avoid-conflict-with-guava!listenablefuture.jar, download failed: com.google.code.findbugs#jsr305;3.0.2!jsr305.jar, download failed: org.checkerframework#checker-qual;2.8.1!checker-qual.jar, download failed: com.google.errorprone#error_prone_annotations;2.3.2!error_prone_annotations.jar, download failed: com.google.j2objc#j2objc-annotations;1.3!j2objc-annotations.jar, download failed: org.codehaus.mojo#animal-sniffer-annotations;1.17!animal-sniffer-annotations.jar, download failed: com.google.protobuf#protobuf-java;3.9.0!protobuf-java.jar(bundle), download failed: joda-time#joda-time;2.10.6!joda-time.jar]
    at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1389)
    at org.apache.spark.deploy.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:54)
    at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:308)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:871)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1007)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1016)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I see two options:

  1. We support it.

I can dive back into dependency tree, and do a lot of pom.xml surgery again to see if I can get us to a state to run --packages again. Last time I tried this with the two 3.0.0 previews, the results were not fruitful, and I'm highly doubtful I can get it work again.

  1. We don't support it.

    aut can be loaded by Spark to driver and executor classpaths with --jars.

If we go with option 2 (which is my preference), I'll work on getting the documentation updated.

ianmilligan1 commented 4 years ago

My preference is also for option 2 - I have no issues using the jar command, it works well, and I’ve always found it more straightforward than packages even when both worked. So consider this a strong vote in favour of loving towards a 1.0.0 without packages.

SamFritz commented 4 years ago

Similar to Ian's response, I think most of my work with AUT has used jars - which has been straightforward to work with. I can't recall a significant number of users who've identified using --packages over --jars, so it makes complete sense to move forward with option 2 (+1 from me)!

lintool commented 4 years ago

+1 for (2) sgtm

ruebot commented 4 years ago

Cool. I'll leave this open for a week, and just to make sure there isn't a solid argument to keep it from any community members.

ruebot commented 4 years ago

Marking as resolved; won't support.