NationalGenomicsInfrastructure / piper

A genomics pipeline build on top of the GATK Queue framework
9 stars 9 forks source link

Problem with sthlm2UUSNP #3

Closed vezzi closed 10 years ago

vezzi commented 10 years ago

I am trying to use sthlm2UUSNP to convert a NGI-S project into a NGI-U project as a first step to test piper. This is the folder organisation of the project G.Grigelioniene_14_01 (P_name, Sample name, fastq_files)

G.Grigelioniene_14_01/
`-- P1142_101
    `-- 140528_BC423WACXX
        |-- 1_140528_BC423WACXX_P1142_101_1.fastq.gz
        |-- 1_140528_BC423WACXX_P1142_101_2.fastq.gz
        |-- 2_140528_BC423WACXX_P1142_101_1.fastq.gz
        |-- 2_140528_BC423WACXX_P1142_101_2.fastq.gz
        |-- 3_140528_BC423WACXX_P1142_101_1.fastq.gz
        |-- 3_140528_BC423WACXX_P1142_101_2.fastq.gz
        |-- 4_140528_BC423WACXX_P1142_101_1.fastq.gz
        `-- 4_140528_BC423WACXX_P1142_101_2.fastq.gz

I created the folder G.Grigelioniene_14_01_SNPseq and run from this new folder the following

sthlm2UUSNP -i ../G.Grigelioniene_14_01/  -o . 
What I get is the following

G.Grigelioniene_14_01_SNPseq |-- 140528_BC423WACXX | |-- P1142_101_AAAAAA_L001_R1_001.fastq.gz | -- P1142_101_AAAAAA_L001_R2_001.fastq.gz |-- report.tsv


there are two main problems
1. as you can see the only the first read pair is placed in the NGI-U folder structure
2. report.tsv is inside the flowcell, it should be one level down
johandahlberg commented 10 years ago

This is related to #1, but this is a more detailed description of the problem. I'm working on resolving this now.

johandahlberg commented 10 years ago

@vezzi and @mariogiov in the cd69a89500541e82e813573374efdaf96304200f commit, I think that I've solved this particular issue. Would one of you be willing to try it out to confirm this?

Also @vezzi, the report.tsv is supposed to be located at the runfolder level. Your comment above is slightly malformatted which makes me somewhat confused as to what you mean. But from my testing at the moment this seems to work as it should now.

vezzi commented 10 years ago

@johandahlberg : I just re-install latest version and followed steps of issue #5

I pull from git, and checked that the right version was there

[vezzi@nestor1 piper]$ git branch -v
* master cd69a89 Bumping version to v1.2.0-beta3

At this point I rerun ./setup.sh and in 5 or so minutes piper was installed. I rerun what I have already done in #5 obtaining more or less the same errors/troubles, I repeat them here.

I checked it pretty in a hurry so tomorrow first thing I will redo this. I hope that now the problem with the report.tsv being or not being in the right place is more clear.

I run the conversion script and get the following dir structure

[vezzi@nestor1 G.Grigelioniene_14_01_SNPseq]$ tree
.
`-- 140528_BC423WACXX
    |-- report.tsv
    `-- Sample_P1142_101
        |-- P1142_101_AAAAAA_L001_R1_001.fastq.gz
        |-- P1142_101_AAAAAA_L001_R2_001.fastq.gz
        |-- P1142_101_AAAAAA_L002_R1_001.fastq.gz
        |-- P1142_101_AAAAAA_L002_R2_001.fastq.gz
        |-- P1142_101_AAAAAA_L003_R1_001.fastq.gz
        |-- P1142_101_AAAAAA_L003_R2_001.fastq.gz
        |-- P1142_101_AAAAAA_L004_R1_001.fastq.gz
        `-- P1142_101_AAAAAA_L004_R2_001.fastq.gz

which looks ok. I run from another dir (ANALYSIS) this command

setupFileCreator -o pipelineSetup.xml -p G.Grigelioniene_14_01_SNPseq -s  Illumina -c NGI -a a2010002 -i /proj/a2010002/nobackup/vezzi/DATA/G.Grigelioniene_14_01_SNPseq/140528_BC423WACXX/ -r /proj/a2010002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta

and I get:

Exception in thread "main" java.lang.Error: Could not find report.xml in /proj/a2010002/nobackup/vezzi/DATA/G.Grigelioniene_14_01_SNPseq
        at molmed.apps.setupcreator.SetupUtils$$anonfun$setRunfolders$1$$anonfun$3.apply(SetupUtils.scala:108)
        at molmed.apps.setupcreator.SetupUtils$$anonfun$setRunfolders$1$$anonfun$3.apply(SetupUtils.scala:108)
        at scala.Option.getOrElse(Option.scala:120)
        at molmed.apps.setupcreator.SetupUtils$$anonfun$setRunfolders$1.lookForReport$1(SetupUtils.scala:108)
        at molmed.apps.setupcreator.SetupUtils$$anonfun$setRunfolders$1.apply(SetupUtils.scala:114)
        at molmed.apps.setupcreator.SetupUtils$$anonfun$setRunfolders$1.apply(SetupUtils.scala:100)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.AbstractTraversable.map(Traversable.scala:105)
        at molmed.apps.setupcreator.SetupUtils$.setRunfolders(SetupUtils.scala:100)
        at molmed.apps.setupcreator.SetupUtils$$anonfun$1.apply(SetupUtils.scala:71)
        at molmed.apps.setupcreator.SetupUtils$$anonfun$1.apply(SetupUtils.scala:69)
        at scala.collection.LinearSeqOptimized$class.foldLeft(LinearSeqOptimized.scala:111)
        at scala.collection.immutable.List.foldLeft(List.scala:84)
        at molmed.apps.setupcreator.SetupUtils$.setupRunfolderStructureFromSamplePaths(SetupUtils.scala:69)
        at molmed.apps.setupcreator.SetupFileCreator$.runNonInteractiveMode(SetupFileCreator.scala:92)
        at molmed.apps.setupcreator.SetupFileCreator$$anonfun$11.apply(SetupFileCreator.scala:72)
        at molmed.apps.setupcreator.SetupFileCreator$$anonfun$11.apply(SetupFileCreator.scala:64)
        at scala.Option.map(Option.scala:145)
        at molmed.apps.setupcreator.SetupFileCreator$delayedInit$body.apply(SetupFileCreator.scala:64)
        at scala.Function0$class.apply$mcV$sp(Function0.scala:40)
        at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
        at scala.App$$anonfun$main$1.apply(App.scala:71)
        at scala.App$$anonfun$main$1.apply(App.scala:71)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32)
        at scala.App$class.main(App.scala:71)
        at molmed.apps.setupcreator.SetupFileCreator$.main(SetupFileCreator.scala:13)
        at molmed.apps.setupcreator.SetupFileCreator.main(SetupFileCreator.scala)

So like I did in #5 I copied report.tsv one level back and the previous command works. This is the xml created:

[vezzi@nestor1 G.Grigelioniene_14_01]$ less pipelineSetup.xml 
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<project xmlns="setup.xml.molmed">
    <metadata>
        <name>G.Grigelioniene_14_01_SNPseq</name>
        <sequenceingcenter>NGI</sequenceingcenter>
        <platfrom>Illumina</platfrom>
        <uppmaxprojectid>a2010002</uppmaxprojectid>
        <uppmaxqos></uppmaxqos>
    </metadata>
    <inputs>
        <runfolder>
            <report>/proj/a2010002/nobackup/vezzi/DATA/G.Grigelioniene_14_01_SNPseq/report.tsv</report>
            <samplefolder>
                <name>140528_BC423WACXX</name>
                <path>/proj/a2010002/nobackup/vezzi/DATA/G.Grigelioniene_14_01_SNPseq/140528_BC423WACXX</path>
                <reference>/proj/a2010002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
        </runfolder>
    </inputs>
</project>

I then run piper:

WholeGenome.sh --xml_input pipelineSetup.xml

but it failed complaining that globalConfig.sh was not present. I then modify like this the following files like I did in #5 :

/home/vezzi/Bin/Piper/workflows/WholeGenome.sh
     + source /proj/a2010002/nobackup/tools/piper/globalConfig.sh
     - source globalConfig.sh
/proj/a2010002/nobackup/tools/piper/globalConfig.sh
      - SCRIPTS_DIR="${PWD}/qscripts"
      + SCRIPTS_DIR="/proj/a2010002/nobackup/tools/piper/qscripts"

At this point I rerun the command:

 ~/Bin/Piper/workflows/WholeGenome.sh --xml_input pipelineSetup.xml

Apparently producing the same error:

[vezzi@nestor1 G.Grigelioniene_14_01]$ ~/Bin/Piper/workflows/WholeGenome.sh --xml_input pipelineSetup.xml
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/apus/h1/vezzi/Bin/Piper/Piper/Piper-v1.2.0-beta3/lib/GenomeAnalysisTK.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/apus/h1/vezzi/Bin/Piper/Piper/Piper-v1.2.0-beta3/lib/Queue.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
INFO  16:21:31,926 QScriptManager - Compiling 1 QScript 
DEBUG 16:21:31,928 QScriptManager - Compilation directory: /tmp/Q-Classes-6476092599625576315 
WARN  16:21:36,924 QScriptManager - there were 1 feature warning(s); re-run with -feature for details 
WARN  16:21:36,927 QScriptManager - two warnings found 
WARN  16:21:36,928 QScriptManager - Compile succeeded with 2 warnings 
INFO  16:21:37,026 HelpFormatter - ---------------------------------------------------------------------- 
INFO  16:21:37,026 HelpFormatter - Queue v<unknown>, Compiled 2014/07/09 13:51:39 
INFO  16:21:37,026 HelpFormatter - Copyright (c) 2012 The Broad Institute 
INFO  16:21:37,026 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
DEBUG 16:21:37,026 HelpFormatter - Current directory: /apus/v1/a2010002_nobackup/vezzi/ANALYSIS/G.Grigelioniene_14_01 
INFO  16:21:37,027 HelpFormatter - Program Args: -S /proj/a2010002/nobackup/tools/piper/qscripts/DNABestPracticeVariantCalling.scala --xml_input pipelineSetup.xml --global_config uppmax_global_config.xml --number_of_threads 8 --scatter_gather 23 -jobRunner Drmaa -jobNative -A  -p node -N 1  --job_walltime 345600 --create_delivery -l DEBUG 
INFO  16:21:37,027 HelpFormatter - Executing as vezzi@nestor1.uppmax.uu.se on Linux 2.6.32-431.20.3.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15. 
INFO  16:21:37,028 HelpFormatter - Date/Time: 2014/07/09 16:21:37 
INFO  16:21:37,028 HelpFormatter - ---------------------------------------------------------------------- 
INFO  16:21:37,028 HelpFormatter - ---------------------------------------------------------------------- 
INFO  16:21:37,037 QCommandLine - Scripting DNABestPracticeVariantCalling 
INFO  16:21:37,175 QCommandLine - Done with errors 
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
org.broadinstitute.sting.utils.exceptions.UserException$CannotExecuteQScript: Unable to execute QScript: DNABestPracticeVariantCalling.script() threw the following exception: java.lang.IndexOutOfBoundsException: 0
        at org.broadinstitute.sting.queue.QCommandLine$$anonfun$execute$5.apply(QCommandLine.scala:159)
        at org.broadinstitute.sting.queue.QCommandLine$$anonfun$execute$5.apply(QCommandLine.scala:147)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at org.broadinstitute.sting.queue.QCommandLine.execute(QCommandLine.scala:147)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
        at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala:62)
        at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)
Caused by: java.lang.IndexOutOfBoundsException: 0
        at scala.collection.LinearSeqOptimized$class.apply(LinearSeqOptimized.scala:52)
        at scala.collection.immutable.List.apply(List.scala:84)
        at molmed.qscripts.DNABestPracticeVariantCalling.script(DNABestPracticeVariantCalling.scala:341)
        at org.broadinstitute.sting.queue.QCommandLine$$anonfun$execute$5.apply(QCommandLine.scala:156)
        ... 10 more
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.1-0-g72492bb):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Unable to execute QScript: DNABestPracticeVariantCalling.script() threw the following exception: java.lang.IndexOutOfBoundsException: 0
##### ERROR ------------------------------------------------------------------------------------------
INFO  16:21:37,182 QCommandLine - Shutting down jobs. Please wait... 
DEBUG 16:21:37,188 IOUtils - Deleted /tmp/Q-Classes-6476092599625576315 
mv: cannot stat `*.jobreport.*': No such file or directory
[vezzi@nestor1 G.Grigelioniene_14_01]$ 
johandahlberg commented 10 years ago

There are many things going on here. But I think I figured out at least part of it.

This command:

setupFileCreator -o pipelineSetup.xml -p G.Grigelioniene_14_01_SNPseq -s  Illumina -c NGI -a a2010002 -i /proj/a2010002/nobackup/vezzi/DATA/G.Grigelioniene_14_01_SNPseq/140528_BC423WACXX/ -r /proj/a2010002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta

Needs to be changed to:

setupFileCreator -o pipelineSetup.xml -p G.Grigelioniene_14_01_SNPseq -s  Illumina -c NGI -a a2010002 -i /proj/a2010002/nobackup/vezzi/DATA/G.Grigelioniene_14_01_SNPseq/140528_BC423WACXX/Sample_<MY_SAMPLE> -r /proj/a2010002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta

Note that -i should point to the "sample folder" not the runfolder:

-i Input path to sample directory. | --input_sample Input path to sample directory.

Do you see what I mean?

@mariogiov sent me the setup-xml he was using which was causing him trouble earlier and that seemed to have the very same problem. The path is not pointing to a sample folder, but to a runfolder.

The problem with the global config is this issue: #2. I'm working on that now and will probably have that fixed before the end of today.

vezzi commented 10 years ago

Ok, this is the new run, some progress has been made but it stills not start properly:

  1. Remove from .bashrc all previous Piper-related lines.
  2. Delete Piper and ~/Bin/ folders
  3. Close connection on Nestor and Reopen it

From /proj/a2010002/software/ (home directories are assigned only 18G in total!!!!! How is this possible!!!):

git clone https://github.com/NationalGenomicsInfrastructure/piper.git
cd piper
git branch -v
     * master 05ed24d Bumping version to v1.2.0-beta4
./setup.sh 

I modified my .bashrc as specified by piper setup final command:

module load java/sun_jdk1.7.0_25
PATH=$PATH:/proj/a2010002/software/piper_bin/bin 
PATH=$PATH:/proj/a2010002/software/piper_bin//workflows/
export LD_LIBRARY_PATH=/sw/apps/build/slurm-drmaa/default/lib/:$LD_LIBRARY_PATH
export PIPER_GLOB_CONF=/proj/a2010002/software/piper_bin//workflows/globalConfig.sh

I recreated the converted folder structure under:

/proj/a2010002/nobackup/vezzi/DATA/G.Grigelioniene_14_01_SNPseq

and I moved to the analysis folder:

/proj/a2010002/nobackup/vezzi/ANALYSIS/G.Grigelioniene_14_01

At this point I run:

WholeGenome.sh --xml_input pipelineSetup.xml 

and obtained the following error:

[vezzi@nestor1 G.Grigelioniene_14_01]$ WholeGenome.sh --xml_input pipelineSetup.xml 
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/apus/v1/a2010002/software/piper_bin/Piper/Piper-v1.2.0-beta4/lib/GenomeAnalysisTK.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/apus/v1/a2010002/software/piper_bin/Piper/Piper-v1.2.0-beta4/lib/Queue.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR stack trace 
org.broadinstitute.sting.utils.exceptions.UserException$CouldNotReadInputFile: Couldn't read file /proj/a2010002/nobackup/vezzi/ANALYSIS/G.Grigelioniene_14_01/qscripts/DNABestPracticeVariantCalling.scala because it does not exist.
        at org.broadinstitute.sting.queue.QScriptManager$$anonfun$loadScripts$1.apply(QScriptManager.scala:52)
        at org.broadinstitute.sting.queue.QScriptManager$$anonfun$loadScripts$1.apply(QScriptManager.scala:52)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at org.broadinstitute.sting.queue.QScriptManager.loadScripts(QScriptManager.scala:51)
        at org.broadinstitute.sting.queue.QCommandLine.org$broadinstitute$sting$queue$QCommandLine$$qScriptPluginManager$lzycompute(QCommandLine.scala:95)
        at org.broadinstitute.sting.queue.QCommandLine.org$broadinstitute$sting$queue$QCommandLine$$qScriptPluginManager(QCommandLine.scala:93)
        at org.broadinstitute.sting.queue.QCommandLine.getArgumentSources(QCommandLine.scala:227)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:205)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
        at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala:62)
        at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)
##### ERROR ------------------------------------------------------------------------------------------
##### ERROR A GATK RUNTIME ERROR has occurred (version 3.1-0-g72492bb):
##### ERROR
##### ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
##### ERROR If not, please post the error message, with stack trace, to the GATK forum.
##### ERROR Visit our website and forum for extensive documentation and answers to 
##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
##### ERROR
##### ERROR MESSAGE: Couldn't read file /proj/a2010002/nobackup/vezzi/ANALYSIS/G.Grigelioniene_14_01/qscripts/DNABestPracticeVariantCalling.scala because it does not exist.
##### ERROR ------------------------------------------------------------------------------------------
INFO  09:23:43,860 QCommandLine - Shutting down jobs. Please wait... 
DEBUG 09:23:43,867 IOUtils - Deleted /tmp/Q-Classes-1547885021545136127 
mv: cannot stat `*.jobreport.*': No such file or directory

So in

/proj/a2010002/software/piper_bin//workflows/globalConfig.sh

I made the following modification:

- SCRIPTS_DIR="${PWD}/qscripts"
+ SCRIPTS_DIR="/proj/a2010002/software/piper/qscripts/"

I rerun it and got the following:

WholeGenome.sh --xml_input pipelineSetup.xml SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/apus/v1/a2010002/software/piper_bin/Piper/Piper-v1.2.0-beta4/lib/GenomeAnalysisTK.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/apus/v1/a2010002/software/piper_bin/Piper/Piper-v1.2.0-beta4/lib/Queue.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
INFO  09:34:14,849 QScriptManager - Compiling 1 QScript 
DEBUG 09:34:14,851 QScriptManager - Compilation directory: /tmp/Q-Classes-7636663011626373424 
WARN  09:34:19,929 QScriptManager - there were 1 feature warning(s); re-run with -feature for details 
WARN  09:34:19,933 QScriptManager - two warnings found 
WARN  09:34:19,933 QScriptManager - Compile succeeded with 2 warnings 
INFO  09:34:20,027 HelpFormatter - ---------------------------------------------------------------------- 
INFO  09:34:20,028 HelpFormatter - Queue v<unknown>, Compiled 2014/07/10 07:03:31 
INFO  09:34:20,028 HelpFormatter - Copyright (c) 2012 The Broad Institute 
INFO  09:34:20,028 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
DEBUG 09:34:20,028 HelpFormatter - Current directory: /apus/v1/a2010002_nobackup/vezzi/ANALYSIS/G.Grigelioniene_14_01 
INFO  09:34:20,028 HelpFormatter - Program Args: -S /proj/a2010002/software/piper/qscripts//DNABestPracticeVariantCalling.scala --xml_input pipelineSetup.xml --global_config uppmax_global_config.xml --number_of_threads 8 --scatter_gather 23 -jobRunner Drmaa -jobNative -A  -p node -N 1  --job_walltime 345600 --create_delivery -l DEBUG 
INFO  09:34:20,029 HelpFormatter - Executing as vezzi@nestor1.uppmax.uu.se on Linux 2.6.32-431.20.3.el6.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.7.0_25-b15. 
INFO  09:34:20,029 HelpFormatter - Date/Time: 2014/07/10 09:34:20 
INFO  09:34:20,029 HelpFormatter - ---------------------------------------------------------------------- 
INFO  09:34:20,029 HelpFormatter - ---------------------------------------------------------------------- 
INFO  09:34:20,039 QCommandLine - Scripting DNABestPracticeVariantCalling 
INFO  09:34:20,168 QCommandLine - Done with errors 
Exception in thread "main" java.lang.AssertionError: assertion failed: Sample folders must be prefixed with Sample_
        at scala.Predef$.assert(Predef.scala:179)
        at molmed.queue.setup.SetupXMLReader$$anonfun$getSamples$1.apply(SetupXMLReader.scala:120)
        at molmed.queue.setup.SetupXMLReader$$anonfun$getSamples$1.apply(SetupXMLReader.scala:119)
        at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
        at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
        at molmed.queue.setup.SetupXMLReader.getSamples(SetupXMLReader.scala:119)
        at molmed.qscripts.DNABestPracticeVariantCalling.script(DNABestPracticeVariantCalling.scala:339)
        at org.broadinstitute.sting.queue.QCommandLine$$anonfun$execute$5.apply(QCommandLine.scala:156)
        at org.broadinstitute.sting.queue.QCommandLine$$anonfun$execute$5.apply(QCommandLine.scala:147)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
        at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at org.broadinstitute.sting.queue.QCommandLine.execute(QCommandLine.scala:147)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
        at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
        at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala:62)
        at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)
INFO  09:34:20,172 QCommandLine - Shutting down jobs. Please wait... 
DEBUG 09:34:20,178 IOUtils - Deleted /tmp/Q-Classes-7636663011626373424 
mv: cannot stat `*.jobreport.*': No such file or directory
[vezzi@nestor1 G.Grigelioniene_14_01]$ ls /proj/a2010002/software/piper/qscripts//DNABestPracticeVariantCalling.scala
/proj/a2010002/software/piper/qscripts//DNABestPracticeVariantCalling.scala

Apparently it complains about the fact that Sample folder does not start with "Sample_" however this is how the data structure looks like:

[vezzi@nestor1 G.Grigelioniene_14_01]$ tree /proj/a2010002/nobackup/vezzi/DATA/G.Grigelioniene_14_01_SNPseq/
/proj/a2010002/nobackup/vezzi/DATA/G.Grigelioniene_14_01_SNPseq/
`-- 140528_BC423WACXX
    |-- report.tsv
    `-- Sample_P1142_101
        |-- P1142_101_AAAAAA_L001_R1_001.fastq.gz
        |-- P1142_101_AAAAAA_L001_R2_001.fastq.gz
        |-- P1142_101_AAAAAA_L002_R1_001.fastq.gz
        |-- P1142_101_AAAAAA_L002_R2_001.fastq.gz
        |-- P1142_101_AAAAAA_L003_R1_001.fastq.gz
        |-- P1142_101_AAAAAA_L003_R2_001.fastq.gz
        |-- P1142_101_AAAAAA_L004_R1_001.fastq.gz
        `-- P1142_101_AAAAAA_L004_R2_001.fastq.gz

2 directories, 9 files

and here is the xml file

[vezzi@nestor1 G.Grigelioniene_14_01]$ less pipelineSetup.xml 
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<project xmlns="setup.xml.molmed">
    <metadata>
        <name>G.Grigelioniene_14_01_SNPseq</name>
        <sequenceingcenter>NGI</sequenceingcenter>
        <platfrom>Illumina</platfrom>
        <uppmaxprojectid>a2010002</uppmaxprojectid>
        <uppmaxqos></uppmaxqos>
    </metadata>
    <inputs>
        <runfolder>
            <report>/proj/a2010002/nobackup/vezzi/DATA/G.Grigelioniene_14_01_SNPseq/140528_BC423WACXX/report.tsv</report>
            <samplefolder>
                <name>P1142_101</name>
                <path>/proj/a2010002/nobackup/vezzi/DATA/G.Grigelioniene_14_01_SNPseq/140528_BC423WACXX/Sample_P1142_101</path>
                <reference>/proj/a2010002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta</reference>
            </samplefolder>
        </runfolder>
    </inputs>
</project>

Hope everything is clear.

mariogiov commented 10 years ago

Hi guys,

You've probably already noticed the issue I opened regarding the directory structure, but I think at least some of these troubles we're having are related: see issue https://github.com/NationalGenomicsInfrastructure/piper/issues/6

Updated to correct my startlingly bad grammar. Please excuse this as I've been cutting back on coffee lately.

johandahlberg commented 10 years ago

@vezzi the problem with the qscript paths is something that I missed. I'll fix that as quickly as possible. (Like right now)

Regarding the second problem where the assertion fails. This seems to be something that I broke before going home yesterday. Apparently when you have unit tests you might want to run them before actually pushing your changes - I had several angry email from Travis showing that I'd broken the build. So I'll fix that up as well.

Sorry for all the problems with this right now.

mariogiov commented 10 years ago

No worries J-Man! It's surely easier on our end finding the bugs than it is on your end fixing them but it's always nice to get them squashed anyway

vezzi commented 10 years ago

Yep no worries I am really enjoying my new life as a Beta-Tester of Beta-versions :-D

johandahlberg commented 10 years ago

And once again I've bumped the version and piper v1.2.0-beta5 is ready for your testing pleasure. Check it out and see if you can get it working.

vezzi commented 10 years ago

OK ready for another run!!!! I will follow again the procedure I have described this morning

johandahlberg commented 10 years ago

@vezzi Do that, but there should be no need to change the SCRIPTS_DIR. That stuff should work out of the box now.

vezzi commented 10 years ago

OK.. I am really close now I have only to change the globalconfig accordingly!!!!

If I understood well the reference genome specified in the xml file must point to the place where both the fasta file and the bwa index are locate right @johandahlberg ?

I did this (before my xml was pointing to the GATK bundle) but the following error pops outs:

WholeGenome.sh --xml_input pipelineSetup.xml
....
INFO  15:28:39,232 QCommandLine - Scripting DNABestPracticeVariantCalling 
INFO  15:28:39,453 QCommandLine - Done with errors 
Exception in thread "main" java.lang.AssertionError: Couldn't find resource: /proj/a2009002/piper_references/gatk_bundle/2.8/b37/hapmap_3.3.b37.vcf This is needed for variant recalibrations.

the problem is that I cannot find in

/proj/a2010002/software/piper_bin/workflows/globalConfig.sh

a reference to ´a2009002´

but I can find one in

/proj/a2010002/software/piper_bin/workflows/WholeGenome.sh
  + #SBATCH -A a2010002
   - #SBATCH -A a2009002

but nothing change..... then my old friend grep told me that a lot of references to a2009002 were in

/proj/a2010002/software/piper_bin/workflows/uppmax_global_config.xml

which does make sense!!!!

Done... and now the .fai index is missing!!! hahahhahha

At this point I ask to the expert: @johandahlberg what do you suggest:

johandahlberg commented 10 years ago

There should be able to use the following reference (on Nestor):

/proj/a2009002/piper_references/gatk_bundle/2.8/b37/human_g1k_v37.fasta

All the other files in uppmax_global_config.xml should point to files which are accessible to you. If you can't access the files tell me and I'll make sure to fix any permission issues.

vezzi commented 10 years ago

I had to change in

/proj/a2010002/software/piper_bin/workflows/globalConfig.sh

pointers to b2013064 with a2009002

I run

WholeGenome.sh --xml_input pipelineSetup.xml

and the dry run finished successfully

Now i just submitted the

WholeGenome.sh --xml_input pipelineSetup.xml --run

command and it is running. Now I will start the testing on @mariogiov ´s part

Great job Johan you can close the majority of issues that are open.

johandahlberg commented 10 years ago

Great! I'll go through and close the the related issues today. :smile_cat: