Closed okbys closed 10 years ago
I'm trying to recreate this problem now. I will post comments and resolution asap.
On Fri, Jun 6, 2014 at 3:37 AM, okbys notifications@github.com wrote:
Hello,
I'm trying to analyze mobile element insertion sites by gkno fastq-tangram pipeline. But sort-bam task was failed, because there is no input file.
MosaikAligner gives '.bam' as a suffix to specified filename by '--out' paramater.
- stdout/stderr of gkno pipe fastq-tangram.
$ ./gkno pipe fastq-tangram --input-path ./resources/tutorial/current --output-path ./test_output --fasta-reference chr20_fragment.fa --mobile-element-fasta mobile_element_sequences.fa --merged-reference-fasta chr20_fragment_moblist.fa --ann-paired-end pe.100.01.ann --ann-single-end se.100.005.ann --fastq mutated_genome_1.fq --fastq2 mutated_genome_2.fq --tangram-directory tangram-files --histogram-file tangram-files/hist.dat --library-file tangram-files/lib_table.dat --sequencing-technology illumina --bam-list bam_list.txt --bam mutated_genome.bam --vcf mutated_genome.vcf --processors 1 --hash-size 10 --special-reference-hashes 10 --special-reference-prefix chr20_fragment_moblist_10 --region 20
Boston College gkno package
version: 1.20.1 date: June 2014
git commit: 39bf3e44b2bffd3e51b6284f27abef4739b4f1ee
Reading in command line arguments...done. Checking instance information...done. Assigning command line arguments to graph nodes...done. Checking for commands to execute at command line...done.
Workflow: build-tangram-reference (tangram-index): Create an indexed reference file including the mobile elements merge-fasta (concatenate-files): Join multiple files build-reference (mosaik-build-reference): Build the Mosaik reference build-jump-database (mosaik-jump): Generate the jump database for a Mosaik reference create-sequence-dictionary (picard-create-sequence-dictionary): Generate a dictionary containing all of the sequences in the input reference fasta. index-fasta (samtools-index-fasta): Generate an index for a reference fasta file. generate-mosaik-parameters (premo): Determine MosaikAligner parameters based on read and fragment length build-read-archive (mosaik-build-fastq): Build the Mosaik read archive align (mosaik-aligner-special): Pairwise alignment of a read archive with additional 'special' reference sequences. The special sequences must all have a common prefix and alignment to them will be shown in the ZA tags. No primary alignments to the 'special' sequences will occur. sort-bam (bamtools-sort): Sort a BAM file mark-duplicates (dedup): Mark duplicate reads in a BAM file (University of Michigan). index-duplicate-marked (bamtools-index): Index a BAM file. generate-bam-list (generate-file-list): Generate a text file containing a list of files scan-bam-files (tangram-scan): Generate a histogram of the fragment length distributions of the input libraries. detect-mei (tangram-detect): Detect and genotype structural variation events. index-bam (bamtools-index): Index a BAM file.
Logging gkno usage with ID: pipes/fastq-tangram...done.
Executing makefile: make -j 1 --file fastq-tangram.make... Executing task: build-tangram-reference...completed successfully. Executing task: merge-fasta...completed successfully. Executing task: build-reference...completed successfully. Executing task: build-jump-database...completed successfully. make: Warning: File `test_output/chr20_fragment_moblist_10_positions.jmp' has modification time 0.0084 s in the future Executing task: create-sequence-dictionary...completed successfully. Executing task: index-fasta...completed successfully. Executing task: generate-mosaik-parameters...completed successfully. Executing task: build-read-archive...completed successfully. Executing task: align...completed successfully. Executing task: sort-bam...make: *\ [/path/to/gkno/gkno_launcher-1.20.1-g39bf3e44b2/test_output/mutated_genome_sorted.bam] Error 1 .failed
gkno failed to complete successfully. Please check the output files to identify the cause of the problem.
TERMINATED: Errors found in running gkno. See specific error messages above for resolution.
- generated files
$ ls -1 test_output/ chr20_fragment_moblist_10_keys.jmp chr20_fragment_moblist_10_meta.jmp chr20_fragment_moblist_10_positions.jmp chr20_fragment_moblist.dat chr20_fragment_moblist.dict chr20_fragment_moblist.fa chr20_fragment_moblist.fa.fai fastq-tangram_mosaikParameters.json fastq-tangram.stderr fastq-tangram.stdout mutated_genome.bam.bam mutated_genome.bam.stat tangram-reference.dat $
There is 'mutated_genome.bam.bam' instead of 'mutated_genome.bam'.
— Reply to this email directly or view it on GitHub https://github.com/gkno/gkno_launcher/issues/15.
Ok, I'm going to leave this error open, since there are a couple of problems that need to be ironed out. In the meantime, I would like to offer some comments.
gkno pipe fastq-tangram -is test -sh 10 -p 1
This will do exactly the same as what you were trying to get with your command line, but with a lot less effort (and this also executes successfully).
Thank you for pointing out this problem, since there are problems that I need to fix, but you should be able to run this pipeline successfully by dropping the --bam and --output-path arguments in general, or just use the command line above for the test you were attempting.
I will work on the bugs and close this issue when they are fixed.
The bug with the filename has been fixed. Note that this argument is a prefix (I have update the pipeline help to reflect this), so in your original command line, you would need to set '--bam mutated_genome'. Recommended usage is still to omit this argument and let gkno construct the value itself.
I am guessing that you went through the list of required arguments and gave a value to them all? I need to modify the way that this is handled. All of the 'required arguments' are indeed required by the pipeline, but it isn't required that the user gives values on the command line. I'll modify the way the help is shown, so that only arguments whose entry by the user is required are listed as 'required arguments'. It is generally only required that you specify the input files. Any file created by a task in the pipeline will be given a filename by gkno.
Ok, the help messages for tools and pipelines have been changed (version 1.25.3). You should now see that --bam is not a required argument for the fastq-tangram pipeline.
I succeeded in execution of fastq-tangram.
However, the recommended command was failed on the step of tangram_detect, because input file name was wrong. tangram_scan generated two files named "lib_table.dat" and "hist.dat" at my computer. But generated parameters by gkno were "library.dat" and "histogram.dat".
$./gkno pipe fastq-tangram -is test -sh 10 -p 1
======================================================
Boston College gkno package
version: 1.25.5
date: June 2014
git commit: 571dd41854fd871f208d9fdab61cddba558c2d35
======================================================
Reading in command line arguments...done.
Checking instance information...done.
Assigning command line arguments to graph nodes...done.
Checking for commands to execute at command line...done.
Workflow:
build-tangram-reference (tangram-index): Create an indexed reference
file including the mobile
elements
merge-fasta (concatenate-files): Join multiple files
build-reference (mosaik-build-reference): Build the Mosaik reference
build-jump-database (mosaik-jump): Generate the jump database for
a Mosaik reference
index-fasta (samtools-index-fasta): Generate an index for a
reference fasta file.
create-sequence-dictionary (picard-create-sequence-dictionary): Generate a dictionary
containing all of the sequences
in the input reference fasta.
generate-mosaik-parameters (premo): Determine MosaikAligner
parameters based on read and
fragment length
build-read-archive (mosaik-build-fastq): Build the Mosaik read archive
align (mosaik-aligner-special): Pairwise alignment of a read
archive with additional
'special' reference sequences.
The special sequences must all
have a common prefix and
alignment to them will be shown
in the ZA tags. No primary
alignments to the 'special'
sequences will occur.
sort-bam (bamtools-sort): Sort a BAM file
mark-duplicates (dedup): Mark duplicate reads in a BAM
file (University of Michigan).
index-duplicate-marked (bamtools-index): Index a BAM file.
generate-bam-list (generate-file-list): Generate a text file containing
a list of files
scan-bam-files (tangram-scan): Generate a histogram of the
fragment length distributions
of the input libraries.
detect-mei (tangram-detect): Detect and genotype structural
variation events.
Logging gkno usage with ID: pipes/fastq-tangram...done.
Executing makefile: make -j 1 --file fastq-tangram.make...
Executing task: build-tangram-reference...completed successfully.
Executing task: merge-fasta...completed successfully.
Executing task: build-reference...completed successfully.
Executing task: build-jump-database...completed successfully.
Executing task: index-fasta...completed successfully.
Executing task: create-sequence-dictionary...completed successfully.
Executing task: generate-mosaik-parameters...completed successfully.
Executing task: build-read-archive...completed successfully.
Executing task: align...completed successfully.
Executing task: sort-bam...completed successfully.
Executing task: mark-duplicates...completed successfully.
Executing task: index-duplicate-marked...completed successfully.
Executing task: generate-bam-list...completed successfully.
Executing task: scan-bam-files...completed successfully.
Executing task: scan-bam-files...make[1]: *** [/path/to/tangram-files/library.dat] Error 1
make: *** [/path/to/tangram-files/histogram.dat] Error 2
gkno failed to complete successfully. Please check the output files to identify the cause of the problem.
================================================================================================
TERMINATED: Errors found in running gkno. See specific error messages above for resolution.
================================================================================================
.failed
$ ls -1 tangram_files/
hist.dat
lib_table.dat
$
$./gkno pipe fastq-tangram -is test -sh 10 -p 1 -ht tangram_files/hist.dat -l tangram_files/lib_table.dat
======================================================
Boston College gkno package
version: 1.25.5
date: June 2014
git commit: 571dd41854fd871f208d9fdab61cddba558c2d35
======================================================
Reading in command line arguments...done.
Checking instance information...done.
Assigning command line arguments to graph nodes...done.
Checking for commands to execute at command line...done.
Workflow:
build-tangram-reference (tangram-index): Create an indexed reference
file including the mobile
elements
merge-fasta (concatenate-files): Join multiple files
build-reference (mosaik-build-reference): Build the Mosaik reference
build-jump-database (mosaik-jump): Generate the jump database for
a Mosaik reference
index-fasta (samtools-index-fasta): Generate an index for a
reference fasta file.
create-sequence-dictionary (picard-create-sequence-dictionary): Generate a dictionary
containing all of the sequences
in the input reference fasta.
generate-mosaik-parameters (premo): Determine MosaikAligner
parameters based on read and
fragment length
build-read-archive (mosaik-build-fastq): Build the Mosaik read archive
align (mosaik-aligner-special): Pairwise alignment of a read
archive with additional
'special' reference sequences.
The special sequences must all
have a common prefix and
alignment to them will be shown
in the ZA tags. No primary
alignments to the 'special'
sequences will occur.
sort-bam (bamtools-sort): Sort a BAM file
mark-duplicates (dedup): Mark duplicate reads in a BAM
file (University of Michigan).
index-duplicate-marked (bamtools-index): Index a BAM file.
generate-bam-list (generate-file-list): Generate a text file containing
a list of files
scan-bam-files (tangram-scan): Generate a histogram of the
fragment length distributions
of the input libraries.
detect-mei (tangram-detect): Detect and genotype structural
variation events.
Logging gkno usage with ID: pipes/fastq-tangram...done.
Executing makefile: make -j 1 --file fastq-tangram.make...
Executing task: build-tangram-reference...completed successfully.
Executing task: merge-fasta...completed successfully.
Executing task: build-reference...completed successfully.
Executing task: build-jump-database...completed successfully.
Executing task: index-fasta...completed successfully.
Executing task: create-sequence-dictionary...completed successfully.
Executing task: generate-mosaik-parameters...completed successfully.
Executing task: build-read-archive...completed successfully.
Executing task: align...completed successfully.
Executing task: sort-bam...completed successfully.
Executing task: mark-duplicates...completed successfully.
Executing task: index-duplicate-marked...completed successfully.
Executing task: generate-bam-list...completed successfully.
Executing task: scan-bam-files...completed successfully.
Executing task: detect-mei...completed successfully.
$
Thanks!
Sorry about this - I hadn't pushed a modification to one of the tool configuration files that was setting the names of the library and histogram files incorrectly. It was fixed in my system so I wasn't seeing the problem. I just pushed the modification, so version 1.26.0 should work fine (I pulled a fresh clone of my own and tested and it worked fine).
Again, thank you for sending these reports. This is extremely helpful to us.
On Wed, Jun 11, 2014 at 2:24 AM, okbys notifications@github.com wrote:
I succeeded in execution of fastq-tangram.
However, the recommended command was failed on the step of tangram_detect, because input file name was wrong. tangram_scan generated two files named "lib_table.dat" and "hist.dat" at my computer. But generated parameters by gkno were "library.dat" and "histogram.dat".
- stdout/stderr of gkno pipe fastq-tangram.
$./gkno pipe fastq-tangram -is test -sh 10 -p 1
Boston College gkno package
version: 1.25.5 date: June 2014
git commit: 571dd41854fd871f208d9fdab61cddba558c2d35
Reading in command line arguments...done. Checking instance information...done. Assigning command line arguments to graph nodes...done. Checking for commands to execute at command line...done.
Workflow: build-tangram-reference (tangram-index): Create an indexed reference file including the mobile elements merge-fasta (concatenate-files): Join multiple files build-reference (mosaik-build-reference): Build the Mosaik reference build-jump-database (mosaik-jump): Generate the jump database for a Mosaik reference index-fasta (samtools-index-fasta): Generate an index for a reference fasta file. create-sequence-dictionary (picard-create-sequence-dictionary): Generate a dictionary containing all of the sequences in the input reference fasta. generate-mosaik-parameters (premo): Determine MosaikAligner parameters based on read and fragment length build-read-archive (mosaik-build-fastq): Build the Mosaik read archive align (mosaik-aligner-special): Pairwise alignment of a read archive with additional 'special' reference sequences. The special sequences must all have a common prefix and alignment to them will be shown in the ZA tags. No primary alignments to the 'special' sequences will occur. sort-bam (bamtools-sort): Sort a BAM file mark-duplicates (dedup): Mark duplicate reads in a BAM file (University of Michigan). index-duplicate-marked (bamtools-index): Index a BAM file. generate-bam-list (generate-file-list): Generate a text file containing a list of files scan-bam-files (tangram-scan): Generate a histogram of the fragment length distributions of the input libraries. detect-mei (tangram-detect): Detect and genotype structural variation events.
Logging gkno usage with ID: pipes/fastq-tangram...done.
Executing makefile: make -j 1 --file fastq-tangram.make... Executing task: build-tangram-reference...completed successfully. Executing task: merge-fasta...completed successfully. Executing task: build-reference...completed successfully. Executing task: build-jump-database...completed successfully. Executing task: index-fasta...completed successfully. Executing task: create-sequence-dictionary...completed successfully. Executing task: generate-mosaik-parameters...completed successfully. Executing task: build-read-archive...completed successfully. Executing task: align...completed successfully. Executing task: sort-bam...completed successfully. Executing task: mark-duplicates...completed successfully. Executing task: index-duplicate-marked...completed successfully. Executing task: generate-bam-list...completed successfully. Executing task: scan-bam-files...completed successfully. Executing task: scan-bam-files...make[1]: * [/path/to/tangram-files/library.dat] Error 1 make: * [/path/to/tangram-files/histogram.dat] Error 2
gkno failed to complete successfully. Please check the output files to identify the cause of the problem.
TERMINATED: Errors found in running gkno. See specific error messages above for resolution.
.failed
- generated tangram files
$ ls -1 tangram_files/ hist.dat lib_table.dat $
- This command was successed.
$./gkno pipe fastq-tangram -is test -sh 10 -p 1 -ht tangram_files/hist.dat -l tangram_files/lib_table.dat
Boston College gkno package
version: 1.25.5 date: June 2014
git commit: 571dd41854fd871f208d9fdab61cddba558c2d35
Reading in command line arguments...done. Checking instance information...done. Assigning command line arguments to graph nodes...done. Checking for commands to execute at command line...done.
Workflow: build-tangram-reference (tangram-index): Create an indexed reference file including the mobile elements merge-fasta (concatenate-files): Join multiple files build-reference (mosaik-build-reference): Build the Mosaik reference build-jump-database (mosaik-jump): Generate the jump database for a Mosaik reference index-fasta (samtools-index-fasta): Generate an index for a reference fasta file. create-sequence-dictionary (picard-create-sequence-dictionary): Generate a dictionary containing all of the sequences in the input reference fasta. generate-mosaik-parameters (premo): Determine MosaikAligner parameters based on read and fragment length build-read-archive (mosaik-build-fastq): Build the Mosaik read archive align (mosaik-aligner-special): Pairwise alignment of a read archive with additional 'special' reference sequences. The special sequences must all have a common prefix and alignment to them will be shown in the ZA tags. No primary alignments to the 'special' sequences will occur. sort-bam (bamtools-sort): Sort a BAM file mark-duplicates (dedup): Mark duplicate reads in a BAM file (University of Michigan). index-duplicate-marked (bamtools-index): Index a BAM file. generate-bam-list (generate-file-list): Generate a text file containing a list of files scan-bam-files (tangram-scan): Generate a histogram of the fragment length distributions of the input libraries. detect-mei (tangram-detect): Detect and genotype structural variation events.
Logging gkno usage with ID: pipes/fastq-tangram...done.
Executing makefile: make -j 1 --file fastq-tangram.make... Executing task: build-tangram-reference...completed successfully. Executing task: merge-fasta...completed successfully. Executing task: build-reference...completed successfully. Executing task: build-jump-database...completed successfully. Executing task: index-fasta...completed successfully. Executing task: create-sequence-dictionary...completed successfully. Executing task: generate-mosaik-parameters...completed successfully. Executing task: build-read-archive...completed successfully. Executing task: align...completed successfully. Executing task: sort-bam...completed successfully. Executing task: mark-duplicates...completed successfully. Executing task: index-duplicate-marked...completed successfully. Executing task: generate-bam-list...completed successfully. Executing task: scan-bam-files...completed successfully. Executing task: detect-mei...completed successfully. $
Thanks!
— Reply to this email directly or view it on GitHub https://github.com/gkno/gkno_launcher/issues/15#issuecomment-45706217.
I confirmed it. Thanks!
A new tutorial (http://blog.gkno.me/post/88719010295/call-meis-on-multiple-fastq-files-v-1-27-1) explains how to parallelise alignment of multiple fastq files and also parallelise detection of MEIs across multiple genomic regions.
Hello,
I'm trying to analyze mobile element insertion sites by gkno fastq-tangram pipeline. But sort-bam task was failed, because there is no input file.
MosaikAligner gives '.bam' as a suffix to specified filename by '--out' paramater.
There is 'mutated_genome.bam.bam' instead of 'mutated_genome.bam'.