Closed CarlosBorroto closed 10 years ago
Hi Carlos,
This is true and I need to handle this better. I am intending that this pipeline is for paired end reads only and a second pipeline to be provided for single end reads. The reason that this is listed as optional is that it links to MOSAIK for which providing a second fastq file is optional. I need to include the option within the pipeline configuration file to set a parameter as required or not that would override those set for the individual tool. I will get this sorted asap.
By the way, thank you so much for all of this continued feedback. It is of incredible help to us that you are being so helpful.
Al
On Wed, Jun 26, 2013 at 4:03 PM, Carlos Borroto notifications@github.comwrote:
Hi,
I found that ' --fastq2 (-q2)' is labeled as 'Optional pipeline specific arguments' in pipeline 'fastq-bam'. However if this argument is not provided the execution breaks.
$ ~/src/gkno_launcher/gkno pipe fastq-bam --hash-size 10 --fasta ~/src/gkno_launcher/resources/tutorial/current/test_genome.fa --fastq ~/src/gkno_launcher/resources/tutorial/current/simulated_reads_1.fq --ann-se ~/src/gkno_launcher/resources/tutorial/current/se.100.005.ann --ann-pe ~/src/gkno_launcher/resources/tutorial/current/pe.100.01.ann --known-sites ~/src/gkno_launcher/resources/tutorial/current/test_genome.dbSNP.snps.sites.vcf
Boston College gkno package
version: 0.89
date: June 2013
Checking tool configuration files... bamleftalign.json...done. vcflib.json...done. mason.json...done. michigan-bam-utilities.json...done. samtools.json...done. freebayes.json...done. 454-tools.json...done. sequence-index-1000g.json...done. mosaik.json...done. tabix.json...done. concatenate-files.json...done. bgzip.json...done. bamtools.json...done. premo.json...done. md5.json...done. picard.json...done. generate-file-list.json...done. ogap.json...done. gzip.json...done. merge-vcf-files.json...done. gatk.json...done. tangram.json...done.
Checking pipeline configuration file...done. Reading in command line arguments...done.
Workflow: build-reference (mosaik-build-reference): Build the Mosaik reference build-jump-database (mosaik-jump): Generate the jump database for a Mosaik reference index-fasta (samtools-index-fasta): Generate an index for a reference fasta file. create-sequence-dictionary (picard-create-sequence-dictionary): Generate a dictionary containing all of the sequences in the input reference fasta. generate-mosaik-parameters (premo): Determine MosaikAligner parameters based on read and fragment length build-read-archive (mosaik-build-fastq): Build the Mosaik read archive align (mosaik-aligner): Pairwise alignment of a read archive sort-primary-bam (bamtools-sort): Sort a BAM file sort-multiple-bam (bamtools-sort): Sort a BAM file index-primary-bam (bamtools-index): Index a BAM file count-covariates (gatk-count-covariates): Count covariates recalibrate-bq (gatk-recalibrate-bq): Recalibrate base qualities mark-duplicates (picard-mark-duplicates): Mark duplicate reads. filter-bam (bamtools-filter): Filter a BAM file on many parameters or combinations of parameters. realign-gaps (ogap): Realigns alignments optimized to open gaps in low-entropy sequence. left-align-indels (bamleftalign): Left-aligns and merges the insertions and deletions in all alignments in stdin. Iterates until each alignment is stable through a left-realignment step. index-final-bam (bamtools-index): Index a BAM file
Assigning command line arguments to tasks...done. Checking the command line arguments...done. Checking instance information...done. Checking multiple runs information...done. Traceback (most recent call last): File "/home/cborroto/src/gkno_launcher/src/gkno/gkno.py", line 467, in
main() File "/home/cborroto/src/gkno_launcher/src/gkno/gkno.py", line 335, in main pl.toolLinkage(task, tool, tl.argumentInformation[tool], make.arguments, iLoop.usingInternalLoop, iLoop.tasks, iLoop.numberOfIterations, verbose) File "/home/cborroto/src/gkno_launcher/src/gkno/pipelines.py", line 888, in toolLinkage for value in arguments[currentTargetTask][0][currentTargetArgument]: self.values[0].append(value) KeyError: u'-fq2' Best, Carlos
— Reply to this email directly or view it on GitHubhttps://github.com/gkno/gkno_launcher/issues/7 .
My pleasure, it is not like I'm not getting anything in return!. I like the simplicity of gkno. I love that I can read the code and understand most it. While the documentation if not fully complete(I'm dying to learn what internal loops are for), it is impressive that so much effort was put into getting it up and running from the beginning.
Hey Carlos,
Continually updating things with gkno with your help. The pipelines now have required parameters set within the configuration files, so you should find that fastq-bam will now terminate with an error when you fail to include --fastq2. fastq-tangram also makes you include the -sref parameter, so that hopefully there will be less problem with this.
As for internal loops (I'll get them into the documentation today), here is a brief description. Take the fastq-vcf pipeline as an example. The first tasks in the pipeline are concerned with building the reference files and need to be performed before any alignments are done. Once these are done though, the alignment steps are independent for each pair of fastq files that you provide and can be performed in parallel. Once the pipeline hits the 'generate-mosaik-parameters' task, all of the tasks in the 'internal loop' section are required for each pair of fastq files and they will be done in parallel (the number of parallel jobs can be set with the --number-jobs (-nj) parameter). The first task in the full pipeline that is not in the internal loop is 'filter-bam'. This task requires all of the alignment steps to be complete before starting. Basically, this task will sit and wait until all of the fastq files are aligned and processed and then accepts as input all of the BAM files produced. The pipeline will then continue to completion using all of the BAM files. The idea is just to ensure that you can maximise the processing power and minimise time when you are dealing with multiple sets of input fastq files, but avoiding splitting the pipeline up into a reference processing pipeline, then an alignment pipeline that you run multiple times and finally a post processing pipeline that uses the outputs of all of the alignment steps. Does this make sense? I'll try and do a better job of explaining this is the documentation.
Thanks again for all your help,
Al
On Thu, Jun 27, 2013 at 10:36 AM, Carlos Borroto notifications@github.comwrote:
My pleasure, it is not like I'm not getting anything in return!. I like the simplicity of gkno. I love that I can read the code and understand most it. While the documentation if not fully complete(I'm dying to learn what internal loops are for), it is impressive that so much effort was put into getting it up and running from the beginning.
— Reply to this email directly or view it on GitHubhttps://github.com/gkno/gkno_launcher/issues/7#issuecomment-20123842 .
Hi,
I found that ' --fastq2 (-q2)' is labeled as 'Optional pipeline specific arguments' in pipeline 'fastq-bam'. However if this argument is not provided the execution breaks.
Best, Carlos