Open dekinsitro opened 5 years ago
Hello @dekinsitro, thank you for submitting this issue.
The docs suggest including org.apache.hadoop:hadoop-aws:2.7.4
, so you may want to try
adam-submit \
--packages com.amazonaws:aws-java-sdk-pom:1.11.463,org.apache.hadoop:hadoop-aws:2.7.4,net.fnothaft:jsr203-s3a:0.0.1 \
-- \
transformAlignments \
s3a://1000genomes/phase1/data/NA12878/exome_alignment/NA12878.mapped.illumina.mosaik.CEU.exome.20110411.bam \
/mnt/test.adam
Are you running Spark on AWS, perhaps via EMR?
I'm running on a simple Ubuntu 18.04 EC2 VM, not EMR. Spark/EMR on AWS already includes the necessary s3 connector jars.
Using your command changes the error, but still roughly the same problem: adam-submit \ --packages com.amazonaws:aws-java-sdk-pom:1.11.463,org.apache.hadoop:hadoop-aws:2.7.4,net.fnothaft:jsr203-s3a:0.0.1 \ -- \ transformAlignments \ s3a://1000genomes/phase1/data/NA12878/exome_alignment/NA12878.mapped.illumina.mosaik.CEU.exome.20110411.bam \ /mnt/test.adam
produces:
::::::::::::::::::::::::::::::::::::::::::::::
:: FAILED DOWNLOADS ::
:: ^ see resolution messages for details ^ ::
::::::::::::::::::::::::::::::::::::::::::::::
:: com.google.code.findbugs#jsr305;3.0.0!jsr305.jar
:: org.apache.commons#commons-math3;3.1.1!commons-math3.jar
:: com.sun.jersey#jersey-json;1.9!jersey-json.jar(bundle)
:: org.codehaus.jettison#jettison;1.1!jettison.jar(bundle)
:: com.sun.xml.bind#jaxb-impl;2.2.3-1!jaxb-impl.jar
:: org.codehaus.jackson#jackson-jaxrs;1.9.13!jackson-jaxrs.jar
:: org.codehaus.jackson#jackson-xc;1.9.13!jackson-xc.jar
:: com.sun.jersey#jersey-server;1.9!jersey-server.jar(bundle)
:: org.tukaani#xz;1.0!xz.jar
:: jline#jline;0.9.94!jline.jar
::::::::::::::::::::::::::::::::::::::::::::::
I don't see any indication the packages are even being attempted to download, just looking for them in the cache.
Right, things can be a little bit different depending on the Spark installation.
For example, for me on Cloudera CDH only the jsr203-s3a
is necessary
$ export AWS_SECRET_ACCESS_KEY=...
$ export AWS_ACCESS_KEY_ID=...
$ adam-submit --packages net.fnothaft:jsr203-s3a:0.0.1 ...
I don't know why your version of Spark isn't trying to download the necessary dependencies, perhaps there are some network or ivy settings issues?
Another option would be to pull the dependencies into your local ivy cache using ivy
directly
$ ivy -dependency com.google.code.findbugs jsr305 3.0.0
:: loading settings :: url = jar:file:/usr/local/Cellar/ivy/2.4.0/libexec/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
:: resolving dependencies :: com.google.code.findbugs#jsr305-caller;working
confs: [default]
found com.google.code.findbugs#jsr305;3.0.0 in public
downloading https://repo1.maven.org/maven2/com/google/code/findbugs/jsr305/3.0.0/jsr305-3.0.0.jar ...
......... (19kB)
.. (0kB)
[SUCCESSFUL ] com.google.code.findbugs#jsr305;3.0.0!jsr305.jar (73ms)
downloading https://repo1.maven.org/maven2/com/google/code/findbugs/jsr305/3.0.0/jsr305-3.0.0-sources.jar ...
........ (16kB)
.. (0kB)
[SUCCESSFUL ] com.google.code.findbugs#jsr305;3.0.0!jsr305.jar(source) (59ms)
downloading https://repo1.maven.org/maven2/com/google/code/findbugs/jsr305/3.0.0/jsr305-3.0.0-javadoc.jar ...
...................... (173kB)
.. (0kB)
[SUCCESSFUL ] com.google.code.findbugs#jsr305;3.0.0!jsr305.jar(javadoc) (88ms)
:: resolution report :: resolve 909ms :: artifacts dl 224ms
---------------------------------------------------------------------
| | modules || artifacts |
| conf | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
| default | 1 | 1 | 1 | 0 || 3 | 3 |
---------------------------------------------------------------------
I'll try hopping on an Ubuntu EC2 instance tomorrow to see if I can replicate your issue.
Interesting suggestion. Please do try to reproduce this problem with a modern (18.04 Ubuntu) VM if possible. I'm basically doing either "conda install -c conda-forge adam" or "pip install bdgenomics.adam" then trying to run a basic transformAlignments on an s3-sourced file
Sorry for dropping this for a while, I'll try to replicate this later this week with the new 0.27.0 release.
I am trying to follow the documentation to allow ADAM to read a BAM file from S3.
According to https://adam.readthedocs.io/en/latest/deploying/aws/#input-and-output-data-on-hdfs-and-s3 I should run a command like this: adam-submit --packages com.amazonaws:aws-java-sdk-pom:1.11.463,net.fnothaft:jsr203-s3a:0.0.1 -- transformAlignments s3a://1000genomes/phase1/data/NA12878/exome_alignment/NA12878.mapped.illumina.mosaik.CEU.exome.20110411.bam /mnt/test.adam
When I run that command, I get an error with many unresolved dependency jars:
:: problems summary :: :::: WARNINGS [NOT FOUND ] org.apache.commons#commons-math3;3.1.1!commons-math3.jar (0ms)
.... :::: WARNINGS [NOT FOUND ] org.apache.commons#commons-math3;3.1.1!commons-math3.jar (0ms)
It's not clear to me (I don't work with Java much) what is going on, but my guess is that the tool that should be downloading package dependencies doesn't run, and it's just looking for cached data in the maven cache.