bigdatagenomics / avocado

A Variant Caller, Distributed. Apache 2 licensed.
http://bdgenomics.org/projects/avocado/
Apache License 2.0
71 stars 42 forks source link

Command to joint call gVCF #284

Closed jpdna closed 6 years ago

jpdna commented 6 years ago

I'm trying to run with the command line: ../avocado/bin/avocado-submit jointer -from_gvcf /Users/paschalj/test9/avacado/run1/data1/*.gvcf /Users/paschalj/test9/avacado/run1/out1 data1 dir has 3 gvcf files in it.

I get error

../avocado/bin/avocado-submit jointer -from_gvcf /Users/paschalj/test9/avacado/run1/data1 /Users/paschalj/test9/avacado/run1/out1
Using SPARK_SUBMIT=/Users/paschalj/test9/spark/spark-2.1.0-bin-hadoop2.7/bin/spark-submit
2018-01-06 17:01:09 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Command body threw exception:
java.io.FileNotFoundException: Couldn't find any files matching /Users/paschalj/test9/avacado/run1/data1 for the requested PathFilter
Exception in thread "main" java.io.FileNotFoundException: Couldn't find any files matching /Users/paschalj/test9/avacado/run1/data1 for the requested PathFilter
    at org.bdgenomics.adam.rdd.ADAMContext.getFsAndFilesWithFilter(ADAMContext.scala:1365)
    at org.bdgenomics.adam.rdd.ADAMContext.loadHeaderLines(ADAMContext.scala:1140)
    at org.bdgenomics.adam.rdd.ADAMContext.loadParquetGenotypes(ADAMContext.scala:2186)

also

../avocado/bin/avocado-submit jointer -from_gvcf /Users/paschalj/test9/avacado/run1/data1/*.gvcf /Users/paschalj/test9/avacado/run1/out1
Using SPARK_SUBMIT=/Users/paschalj/test9/spark/spark-2.1.0-bin-hadoop2.7/bin/spark-submit
Too many arguments: /Users/paschalj/test9/avacado/run1/data1/152152.gvcf

doesn't work for me. Maybe I don't understand how to use the glob. This is on local filesystem.

Can you give me an example command line for running avocado-submit jointer on a directory of gVCF files?

fnothaft commented 6 years ago

Hi @jpdna! I think that what's going wrong is that ADAM expects to see a .vcf extension for gVCF files, not a .gvcf extension. I'll look at this tomorrow. Should be a simple fix!

jpdna commented 6 years ago

I tried changing the files names to *.gvcf.vcf, and now files are found, but still now get this error about having too many parameters

paschalj$ ../avocado/bin/avocado-submit jointer /Users/paschalj/test9/avacado/run1/data1/*.vcf /Users/paschalj/test9/avacado/run1/out1 -from_gvcf
Using SPARK_SUBMIT=/Users/paschalj/test9/spark/spark-2.1.0-bin-hadoop2.7/bin/spark-submit
Too many arguments: /Users/paschalj/test9/avacado/run1/data1/152152.gvcf.vcf
fnothaft commented 6 years ago

Hi @jpdna! You need to escape the glob, otherwise your shell will try to expand the glob on the command line. I.e., you'd want to pass \*.gvcf.vcf on the command line.

jpdna commented 6 years ago

yup that worked for me, thanks!