gkno / gkno_launcher

The gkno launcher for executing tools or pipelines
MIT License
32 stars 7 forks source link

Not passing optional argument --fastq2 to pipeline fastq-bam breaks #7

Closed CarlosBorroto closed 10 years ago

CarlosBorroto commented 11 years ago

Hi,

I found that ' --fastq2 (-q2)' is labeled as 'Optional pipeline specific arguments' in pipeline 'fastq-bam'. However if this argument is not provided the execution breaks.

$ ~/src/gkno_launcher/gkno pipe fastq-bam --hash-size 10 --fasta ~/src/gkno_launcher/resources/tutorial/current/test_genome.fa --fastq ~/src/gkno_launcher/resources/tutorial/current/simulated_reads_1.fq --ann-se ~/src/gkno_launcher/resources/tutorial/current/se.100.005.ann --ann-pe ~/src/gkno_launcher/resources/tutorial/current/pe.100.01.ann --known-sites ~/src/gkno_launcher/resources/tutorial/current/test_genome.dbSNP.snps.sites.vcf

===============================
  Boston College gkno package

  version: 0.89
  date:    June 2013
===============================

Checking tool configuration files...
     bamleftalign.json...done.
     vcflib.json...done.
     mason.json...done.
     michigan-bam-utilities.json...done.
     samtools.json...done.
     freebayes.json...done.
     454-tools.json...done.
     sequence-index-1000g.json...done.
     mosaik.json...done.
     tabix.json...done.
     concatenate-files.json...done.
     bgzip.json...done.
     bamtools.json...done.
     premo.json...done.
     md5.json...done.
     picard.json...done.
     generate-file-list.json...done.
     ogap.json...done.
     gzip.json...done.
     merge-vcf-files.json...done.
     gatk.json...done.
     tangram.json...done.

Checking pipeline configuration file...done.
Reading in command line arguments...done.

Workflow:
     build-reference (mosaik-build-reference):                           Build the Mosaik reference
     build-jump-database (mosaik-jump):                                  Generate the jump database for
                                                                         a Mosaik reference
     index-fasta (samtools-index-fasta):                                 Generate an index for a
                                                                         reference fasta file.
     create-sequence-dictionary (picard-create-sequence-dictionary):     Generate a dictionary
                                                                         containing all of the sequences
                                                                         in the input reference fasta.
     generate-mosaik-parameters (premo):                                 Determine MosaikAligner
                                                                         parameters based on read and
                                                                         fragment length
     build-read-archive (mosaik-build-fastq):                            Build the Mosaik read archive
     align (mosaik-aligner):                                             Pairwise alignment of a read
                                                                         archive
     sort-primary-bam (bamtools-sort):                                   Sort a BAM file
     sort-multiple-bam (bamtools-sort):                                  Sort a BAM file
     index-primary-bam (bamtools-index):                                 Index a BAM file
     count-covariates (gatk-count-covariates):                           Count covariates
     recalibrate-bq (gatk-recalibrate-bq):                               Recalibrate base qualities
     mark-duplicates (picard-mark-duplicates):                           Mark duplicate reads.
     filter-bam (bamtools-filter):                                       Filter a BAM file on many
                                                                         parameters or combinations of
                                                                         parameters.
     realign-gaps (ogap):                                                Realigns alignments optimized
                                                                         to open gaps in low-entropy
                                                                         sequence.
     left-align-indels (bamleftalign):                                   Left-aligns and merges the
                                                                         insertions and deletions in all
                                                                         alignments in stdin.  Iterates
                                                                         until each alignment is stable
                                                                         through a left-realignment step.
     index-final-bam (bamtools-index):                                   Index a BAM file

Assigning command line arguments to tasks...done.
Checking the command line arguments...done.
Checking instance information...done.
Checking multiple runs information...done.
Traceback (most recent call last):
  File "/home/cborroto/src/gkno_launcher/src/gkno/gkno.py", line 467, in <module>
    main()
  File "/home/cborroto/src/gkno_launcher/src/gkno/gkno.py", line 335, in main
    pl.toolLinkage(task, tool, tl.argumentInformation[tool], make.arguments, iLoop.usingInternalLoop, iLoop.tasks, iLoop.numberOfIterations, verbose)
  File "/home/cborroto/src/gkno_launcher/src/gkno/pipelines.py", line 888, in toolLinkage
    for value in arguments[currentTargetTask][0][currentTargetArgument]: self.values[0].append(value)
KeyError: u'-fq2'

Best, Carlos

AlistairNWard commented 11 years ago

Hi Carlos,

This is true and I need to handle this better. I am intending that this pipeline is for paired end reads only and a second pipeline to be provided for single end reads. The reason that this is listed as optional is that it links to MOSAIK for which providing a second fastq file is optional. I need to include the option within the pipeline configuration file to set a parameter as required or not that would override those set for the individual tool. I will get this sorted asap.

By the way, thank you so much for all of this continued feedback. It is of incredible help to us that you are being so helpful.

Al

On Wed, Jun 26, 2013 at 4:03 PM, Carlos Borroto notifications@github.comwrote:

Hi,

I found that ' --fastq2 (-q2)' is labeled as 'Optional pipeline specific arguments' in pipeline 'fastq-bam'. However if this argument is not provided the execution breaks.

$ ~/src/gkno_launcher/gkno pipe fastq-bam --hash-size 10 --fasta ~/src/gkno_launcher/resources/tutorial/current/test_genome.fa --fastq ~/src/gkno_launcher/resources/tutorial/current/simulated_reads_1.fq --ann-se ~/src/gkno_launcher/resources/tutorial/current/se.100.005.ann --ann-pe ~/src/gkno_launcher/resources/tutorial/current/pe.100.01.ann --known-sites ~/src/gkno_launcher/resources/tutorial/current/test_genome.dbSNP.snps.sites.vcf

Boston College gkno package

version: 0.89

date: June 2013

Checking tool configuration files... bamleftalign.json...done. vcflib.json...done. mason.json...done. michigan-bam-utilities.json...done. samtools.json...done. freebayes.json...done. 454-tools.json...done. sequence-index-1000g.json...done. mosaik.json...done. tabix.json...done. concatenate-files.json...done. bgzip.json...done. bamtools.json...done. premo.json...done. md5.json...done. picard.json...done. generate-file-list.json...done. ogap.json...done. gzip.json...done. merge-vcf-files.json...done. gatk.json...done. tangram.json...done.

Checking pipeline configuration file...done. Reading in command line arguments...done.

Workflow: build-reference (mosaik-build-reference): Build the Mosaik reference build-jump-database (mosaik-jump): Generate the jump database for a Mosaik reference index-fasta (samtools-index-fasta): Generate an index for a reference fasta file. create-sequence-dictionary (picard-create-sequence-dictionary): Generate a dictionary containing all of the sequences in the input reference fasta. generate-mosaik-parameters (premo): Determine MosaikAligner parameters based on read and fragment length build-read-archive (mosaik-build-fastq): Build the Mosaik read archive align (mosaik-aligner): Pairwise alignment of a read archive sort-primary-bam (bamtools-sort): Sort a BAM file sort-multiple-bam (bamtools-sort): Sort a BAM file index-primary-bam (bamtools-index): Index a BAM file count-covariates (gatk-count-covariates): Count covariates recalibrate-bq (gatk-recalibrate-bq): Recalibrate base qualities mark-duplicates (picard-mark-duplicates): Mark duplicate reads. filter-bam (bamtools-filter): Filter a BAM file on many parameters or combinations of parameters. realign-gaps (ogap): Realigns alignments optimized to open gaps in low-entropy sequence. left-align-indels (bamleftalign): Left-aligns and merges the insertions and deletions in all alignments in stdin. Iterates until each alignment is stable through a left-realignment step. index-final-bam (bamtools-index): Index a BAM file

Assigning command line arguments to tasks...done. Checking the command line arguments...done. Checking instance information...done. Checking multiple runs information...done. Traceback (most recent call last): File "/home/cborroto/src/gkno_launcher/src/gkno/gkno.py", line 467, in main() File "/home/cborroto/src/gkno_launcher/src/gkno/gkno.py", line 335, in main pl.toolLinkage(task, tool, tl.argumentInformation[tool], make.arguments, iLoop.usingInternalLoop, iLoop.tasks, iLoop.numberOfIterations, verbose) File "/home/cborroto/src/gkno_launcher/src/gkno/pipelines.py", line 888, in toolLinkage for value in arguments[currentTargetTask][0][currentTargetArgument]: self.values[0].append(value) KeyError: u'-fq2'

Best, Carlos

— Reply to this email directly or view it on GitHubhttps://github.com/gkno/gkno_launcher/issues/7 .

CarlosBorroto commented 11 years ago

My pleasure, it is not like I'm not getting anything in return!. I like the simplicity of gkno. I love that I can read the code and understand most it. While the documentation if not fully complete(I'm dying to learn what internal loops are for), it is impressive that so much effort was put into getting it up and running from the beginning.

AlistairNWard commented 11 years ago

Hey Carlos,

Continually updating things with gkno with your help. The pipelines now have required parameters set within the configuration files, so you should find that fastq-bam will now terminate with an error when you fail to include --fastq2. fastq-tangram also makes you include the -sref parameter, so that hopefully there will be less problem with this.

As for internal loops (I'll get them into the documentation today), here is a brief description. Take the fastq-vcf pipeline as an example. The first tasks in the pipeline are concerned with building the reference files and need to be performed before any alignments are done. Once these are done though, the alignment steps are independent for each pair of fastq files that you provide and can be performed in parallel. Once the pipeline hits the 'generate-mosaik-parameters' task, all of the tasks in the 'internal loop' section are required for each pair of fastq files and they will be done in parallel (the number of parallel jobs can be set with the --number-jobs (-nj) parameter). The first task in the full pipeline that is not in the internal loop is 'filter-bam'. This task requires all of the alignment steps to be complete before starting. Basically, this task will sit and wait until all of the fastq files are aligned and processed and then accepts as input all of the BAM files produced. The pipeline will then continue to completion using all of the BAM files. The idea is just to ensure that you can maximise the processing power and minimise time when you are dealing with multiple sets of input fastq files, but avoiding splitting the pipeline up into a reference processing pipeline, then an alignment pipeline that you run multiple times and finally a post processing pipeline that uses the outputs of all of the alignment steps. Does this make sense? I'll try and do a better job of explaining this is the documentation.

Thanks again for all your help,

Al

On Thu, Jun 27, 2013 at 10:36 AM, Carlos Borroto notifications@github.comwrote:

My pleasure, it is not like I'm not getting anything in return!. I like the simplicity of gkno. I love that I can read the code and understand most it. While the documentation if not fully complete(I'm dying to learn what internal loops are for), it is impressive that so much effort was put into getting it up and running from the beginning.

— Reply to this email directly or view it on GitHubhttps://github.com/gkno/gkno_launcher/issues/7#issuecomment-20123842 .