Closed dcopetti closed 6 years ago
Hi Dario,
I agree that it is probably not tigmint that caused the crash, since it was just bwa mem running at that stage. Using additional threads would speed up the bwa mem
alignment step.
Splitting the alignment stage into multiple parallel jobs is certainly an option, and one that I have done before when dealing with a very large dataset.
You could:
bwa mem
command for each partitionsamtools merge -tBX
to merge together the separate BAM filesbwa mem
alignment step. You just have to ensure that the BX-sorted, merged BAM file is named correctly for tigmint to recognize that this file exists and can be used for subsequent steps: Ex. myassembly.myreads.sortbx.bam
, where myassembly.fa is the draft assembly file, and myreads.fq.gz is the chromium reads file. You can use the -n
option with the tigmint makefile to do a 'dry run' of the commands to double check that the pipeline starts up again after the alignment step as desired.Hope that helps! Lauren
Hi Lauren,
Thanks for the suggestion, I will go for it. I am new to using 10X data, and I am not sure about the structure of the fq.gz file in the input (I produced it with longranger basic). If it is a normal fastq, I am thinking just to split it in blocks, or if the barcoding information is creating issues, is there a specific way to do it, like with Longranger? Thanks,
Dario
Hi Dario,
Yes, you can partition the fq.gz (post-Longranger basic) file the same as you would any other interleaved fastq file. The only special thing about the longranger basic output as opposed to another fastq file is the chromium barcode information encoded in the read header.
Ex:
[lcoombe@lcoombe01 outs]$ gunzip -c barcoded.fastq.gz |head -n 10000 |tail -n 8
@E00247:267:HMVT3CCXX:6:1103:27265:17922 BX:Z:AAACACCAGACAATAC-1
TTATGTGGCAAAACCCAGAAAGATCCATCATGAATCCAAGATACTTTCAGCAAAAAGTTATACCAAAATAAATAAAATAAAATTGAAATAATGCTTAGCTGATCCCAAGTCAAGATTTACGTTTGTAT
+
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFFKKKKKKKKKKKKKKKKKKK
@E00247:267:HMVT3CCXX:6:1103:27265:17922 BX:Z:AAACACCAGACAATAC-1
GCCACGGTGCCCAGCTCACCTACTGGTTTTAAAGAGGGAACTCTGGGGGCAGTGTGGAGGGAGGGACCTACTGCCATAGAATACAAACGTAAATCTTGACTTGGGATCAGCTAAGCATTATTTCAATTTTATTTTATTTATTTTGGTATAA
+
AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKAFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKAKKFAFK
So as long as your partitioning approach keeps the R1/R2 pairs together, and keeps the "BX:Z" comment in the read header, you will be fine! Running bwa mem
with the -pC
option as you have in your above command indicates an interleaved reads file, and will put the barcode information in the BX tag of the resulting SAM file, which tigmint will then use in downstream steps.
Hope that helps -- let me know if you have any other questions! Lauren
@dcopetti - I'll close this for now, but feel free to re-open if you have further questions!
Hello, I have the large bam file (with the sorting/tag/index issues of case #811 above) and I tried to run tigmint (span branch). After commenting the commands in lines 142-154 in a copy of the script, I get this:
~/bin/tigmint/tigmint-make3 tigmint draft=genome reads=reads
make: Nothing to be done for 'tigmint'.
~/bin/tigmint/tigmint-make3 --dry-run
echo 'Tigmint: Correct misassemblies using linked reads'
echo 'Usage: tigmint-make [COMMAND]... [PARAMETER=VALUE]...'
echo 'Example: tigmint-make tigmint draft=myassembly reads=myreads'
echo 'For more information see https://bcgsc.github.io/tigmint/'
I am not sure how else I should compose the command to use the bam as an input instead. Thanks, Dario
Hi Dario,
First off, you can just use the master branch -- the logic in the tigmint-span branch used to look for spanning molecules to identify misassemblies has been incorporated into the master branch.
Also, no need to comment out commands to ensure that the Makefile starts at the step that you want. You just need to ensure that your BAM file is named myassembly.myreads.sortbx.bam
(using notation from the help page above), as expected by tigmint. You can use the --dry-run
option with the full tigmint command to ensure that the pipeline is starting where you want it to.
It appears that tigmint detects that the target files of the pipeline are already present in your directory. Could you note the other files in your working directory, as well as what you named the BAM file?
Hi,
I installed the newest version, but it is giving me an error, same as the older installation (both attempts are here below:
[copettid@kp141-242 tigmint]$ tigmint-make
Tigmint: Correct misassemblies using linked reads
Usage: tigmint-make [COMMAND]... [PARAMETER=VALUE]...
Example: tigmint-make tigmint draft=myassembly reads=myreads
For more information see https://bcgsc.github.io/tigmint/
[copettid@kp141-242 tigmint]$ tigmint-make tigmint
make: *** No rule to make target 'draft.reads.as0.65.nm5.molecule.tsv', needed by 'tigmint'. Stop.
[copettid@kp141-242 tigmint]$ ~/bin/old_tigmint/tigmint-make
Tigmint: Correct misassemblies using linked reads
Usage: tigmint-make [COMMAND]... [PARAMETER=VALUE]...
Example: tigmint-make tigmint draft=myassembly reads=myreads
For more information see https://bcgsc.github.io/tigmint/
[copettid@kp141-242 tigmint]$ ~/bin/old_tigmint/tigmint-make tigmint
make: *** No rule to make target 'draft.reads.as0.65.nm5.molecule.tsv', needed by 'tigmint'. Stop.
[copettid@kp141-242 tigmint]$ tigmint-make tigmint raft=Rabiosa_genome reads=reads
make: *** No rule to make target 'draft.reads.as0.65.nm5.molecule.tsv', needed by 'tigmint'. Stop.
Also adding --dry-run gives me the same error. The folder looks like this:
-rwx------ 1 copettid mpb 4.3G Feb 20 08:44 Rabiosa_genome.fa
-rw-r--r-- 1 copettid mpb 4.3G Feb 21 10:00 Rabiosa_genome.fa.bwt
-rw-r--r-- 1 copettid mpb 1.1G Feb 21 10:01 Rabiosa_genome.fa.pac
-rw-r--r-- 1 copettid mpb 8.9M Feb 21 10:01 Rabiosa_genome.fa.ann
-rw-r--r-- 1 copettid mpb 2.6M Feb 21 10:01 Rabiosa_genome.fa.amb
-rw-r--r-- 1 copettid mpb 2.2G Feb 21 10:22 Rabiosa_genome.fa.sa
lrwxrwxrwx 1 copettid mpb 103 Mar 2 08:53 input_fastqs -> '/home/copettid/public/Molecular Plant Breeding Group/Dario/Lolium/rabiosa_raw_data/10x_raw_data/fastqs/'
-rw-r--r-- 1 copettid mpb 344 Mar 5 10:08 transfer
lrwxrwxrwx 1 copettid mpb 144 Mar 5 10:11 reads.fq.gz -> '/home/copettid/public/Molecular Plant Breeding Group/Dario/Lolium/rabiosa_raw_data/10x_raw_data/longranger_basic_output/4606/outs/barcoded.fq.gz'
-rwx------ 1 copettid mpb 180G Mar 17 10:09 Rabiosa_genome.reads.sortbx.bam
-rw-r--r-- 1 copettid mpb 8.4M Mar 28 09:39 Rabiosa_genome.fa.fai
I am not sure what else I should do. Thanks,
Dario
Looks like you misspelled draft
here?
[copettid@kp141-242 tigmint]$ tigmint-make tigmint raft=Rabiosa_genome reads=reads
Dang! That is stupid.
With the --dry-run looks like this:
tigmint-make tigmint draft=Rabiosa_genome reads=reads --dry-run
/home/copettid/bin/tigmint/bin/tigmint-molecule -b Rabiosa_genome.reads.sortbx.bam -o Rabiosa_genome.reads.as0.65.nm5.molecule.tsv -a 0.65 -n 5 -q 0 -d 50000
awk 'NR>1 { print $1"\t"$2-1"\t"$3-1"\tReads="$7",Size="$4",Mapq="$8",AS="$9",NM="$10",BX="$5",MI="$6"\t"$7 }' Rabiosa_genome.reads.as0.65.nm5.molecule.tsv \
| sort -k1,1 -k2,2n -k3,3n >Rabiosa_genome.reads.as0.65.nm5.molecule.bed
awk '$3 - $2 >= 2000' Rabiosa_genome.reads.as0.65.nm5.molecule.bed >Rabiosa_genome.reads.as0.65.nm5.molecule.size2000.bed
/home/copettid/bin/tigmint/bin/tigmint-cut -p8 -w1000 -n20 -t0 -o Rabiosa_genome.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa Rabiosa_genome.fa Rabiosa_genome.reads.as0.65.nm5.molecule.size2000.bed
ln -sf Rabiosa_genome.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa Rabiosa_genome.tigmint.fa
Then I run it and I get this error:
tigmint-make tigmint draft=Rabiosa_genome reads=reads
/home/copettid/bin/tigmint/bin/tigmint-molecule -b Rabiosa_genome.reads.sortbx.bam -o Rabiosa_genome.reads.as0.65.nm5.molecule.tsv -a 0.65 -n 5 -q 0 -d 50000
Traceback (most recent call last):
File "/home/copettid/bin/tigmint/bin/tigmint-molecule", line 16, in <module>
import pysam
ImportError: No module named 'pysam'
/home/copettid/bin/tigmint/bin/tigmint-make:174: recipe for target 'Rabiosa_genome.reads.as0.65.nm5.molecule.tsv' failed
make: *** [Rabiosa_genome.reads.as0.65.nm5.molecule.tsv] Error 1
so I updated pysam:
conda install pysam
Fetching package metadata .................
Solving package specifications: .
Package plan for installation in environment /home/copettid/miniconda2:
The following NEW packages will be INSTALLED:
bcftools: 1.7-0 bioconda
The following packages will be UPDATED:
pysam: 0.8.4-py27_0 bioconda --> 0.14.1-py27_htslib1.7_0 bioconda
[...]
move to a new terminal and I get this:
tigmint-make tigmint draft=Rabiosa_genome reads=reads
make: *** No rule to make target 'reads.fq.gz', needed by 'Rabiosa_genome.reads.sortbx.bam'. Stop.
as if it is not recognizing the bam file?
Just checking -- are you in the same working directory as before? It looks like it can't see the file named reads.fq.gz
?
You are right, the links were not activated. But the error with pysam persists:
/home/copettid/bin/tigmint/bin/tigmint-molecule -b Rabiosa_genome.reads.sortbx.bam -o Rabiosa_genome.reads.as0.65.nm5.molecule.tsv -a 0.65 -n 5 -q 0 -d 50000
Traceback (most recent call last):
File "/home/copettid/bin/tigmint/bin/tigmint-molecule", line 16, in <module>
import pysam
ImportError: No module named 'pysam'
/home/copettid/bin/tigmint/bin/tigmint-make:174: recipe for target 'Rabiosa_genome.reads.as0.65.nm5.molecule.tsv' failed
should I maybe redownlaod tigmint after having updated pysam?
I just downloaded again tigmint, same error
Can you double check that the python3 that you want to use is being found (ie. in your PATH)?
Ex: when you use which python3
, you see the path to your desired miniconda installation?
Nope, it is the one in /usr/bin/python3
.
I will install it tomorrow with my colleague,
thanks for the help!
No problem! Let me know if you encounter any other issues.
Hi,
After installing python3.5 and updating the dependencies, I am now able to run tigmint as
tigmint tigmint draft=Rabiosa_genome reads=reads span=4 nm=4 dist=20000 t=8
Regarding the parameters, are the default already stringent? My goal is to break scaffolds with low (or questionable) 10x support.
In parallel, I am now running longranger wgs to visualize the linked reads around my putative junctions. Will ~12x coverage be enough? Or else I can use all the reads (~25x), but it will take longer.
Thanks for the support, very nice of you!
Hello,
I was running tigmint-span and the sever crashed (I am pretty sure it is not due to tigmint, unless it required a huge amount of memory all of a sudden). I have now a bunch of *.bam.tmp.0628.bam files - I guess it died during the sorting of this command:
bwa mem -t$t -pC $(draft).fa $< | samtools view -h -F4 | samtools sort -@$t -tBX -o $@
I am thinking of two alternative options: