write temporary bam files

dcopetti commented 6 years ago

Hello,

I was running tigmint-span and the sever crashed (I am pretty sure it is not due to tigmint, unless it required a huge amount of memory all of a sudden). I have now a bunch of *.bam.tmp.0628.bam files - I guess it died during the sorting of this command: bwa mem -t$t -pC $(draft).fa $< | samtools view -h -F4 | samtools sort -@$t -tBX -o $@ I am thinking of two alternative options:

to split the above command in separate steps and have a file for each (at least until the following step is completed with success);
to feed tigmint with a bam file produced outside of its pipeline, and run just the actual tigmint algorithm. Would it be possible to implement any of the options, or any other that will help deal with long computation time? Thanks, Dario

lcoombe commented 6 years ago

Hi Dario,

I agree that it is probably not tigmint that caused the crash, since it was just bwa mem running at that stage. Using additional threads would speed up the bwa mem alignment step. Splitting the alignment stage into multiple parallel jobs is certainly an option, and one that I have done before when dealing with a very large dataset.

You could:

split your chromium read file into partitions
run the above bwa mem command for each partition
run samtools merge -tBX to merge together the separate BAM files
Then, continue running the tigmint pipeline from after the bwa mem alignment step. You just have to ensure that the BX-sorted, merged BAM file is named correctly for tigmint to recognize that this file exists and can be used for subsequent steps: Ex. myassembly.myreads.sortbx.bam, where myassembly.fa is the draft assembly file, and myreads.fq.gz is the chromium reads file. You can use the -n option with the tigmint makefile to do a 'dry run' of the commands to double check that the pipeline starts up again after the alignment step as desired.

Hope that helps! Lauren

dcopetti commented 6 years ago

Hi Lauren,

Thanks for the suggestion, I will go for it. I am new to using 10X data, and I am not sure about the structure of the fq.gz file in the input (I produced it with longranger basic). If it is a normal fastq, I am thinking just to split it in blocks, or if the barcoding information is creating issues, is there a specific way to do it, like with Longranger? Thanks,

Dario

lcoombe commented 6 years ago

Hi Dario,

Yes, you can partition the fq.gz (post-Longranger basic) file the same as you would any other interleaved fastq file. The only special thing about the longranger basic output as opposed to another fastq file is the chromium barcode information encoded in the read header.

Ex:

[lcoombe@lcoombe01 outs]$ gunzip -c barcoded.fastq.gz |head -n 10000 |tail -n 8
@E00247:267:HMVT3CCXX:6:1103:27265:17922 BX:Z:AAACACCAGACAATAC-1
TTATGTGGCAAAACCCAGAAAGATCCATCATGAATCCAAGATACTTTCAGCAAAAAGTTATACCAAAATAAATAAAATAAAATTGAAATAATGCTTAGCTGATCCCAAGTCAAGATTTACGTTTGTAT
+
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFFKKKKKKKKKKKKKKKKKKK
@E00247:267:HMVT3CCXX:6:1103:27265:17922 BX:Z:AAACACCAGACAATAC-1
GCCACGGTGCCCAGCTCACCTACTGGTTTTAAAGAGGGAACTCTGGGGGCAGTGTGGAGGGAGGGACCTACTGCCATAGAATACAAACGTAAATCTTGACTTGGGATCAGCTAAGCATTATTTCAATTTTATTTTATTTATTTTGGTATAA
+
AAFFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKFKKFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKAFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKAKKFAFK

So as long as your partitioning approach keeps the R1/R2 pairs together, and keeps the "BX:Z" comment in the read header, you will be fine! Running bwa mem with the -pC option as you have in your above command indicates an interleaved reads file, and will put the barcode information in the BX tag of the resulting SAM file, which tigmint will then use in downstream steps.

Hope that helps -- let me know if you have any other questions! Lauren

lcoombe commented 6 years ago

@dcopetti - I'll close this for now, but feel free to re-open if you have further questions!

dcopetti commented 6 years ago

Hello, I have the large bam file (with the sorting/tag/index issues of case #811 above) and I tried to run tigmint (span branch). After commenting the commands in lines 142-154 in a copy of the script, I get this:

~/bin/tigmint/tigmint-make3 tigmint draft=genome reads=reads
make: Nothing to be done for 'tigmint'.

~/bin/tigmint/tigmint-make3 --dry-run
echo 'Tigmint: Correct misassemblies using linked reads'
echo 'Usage: tigmint-make [COMMAND]... [PARAMETER=VALUE]...'
echo 'Example: tigmint-make tigmint draft=myassembly reads=myreads'
echo 'For more information see https://bcgsc.github.io/tigmint/'

I am not sure how else I should compose the command to use the bam as an input instead. Thanks, Dario

lcoombe commented 6 years ago

Hi Dario,

First off, you can just use the master branch -- the logic in the tigmint-span branch used to look for spanning molecules to identify misassemblies has been incorporated into the master branch.

Also, no need to comment out commands to ensure that the Makefile starts at the step that you want. You just need to ensure that your BAM file is named myassembly.myreads.sortbx.bam (using notation from the help page above), as expected by tigmint. You can use the --dry-run option with the full tigmint command to ensure that the pipeline is starting where you want it to.

It appears that tigmint detects that the target files of the pipeline are already present in your directory. Could you note the other files in your working directory, as well as what you named the BAM file?

dcopetti commented 6 years ago

Hi,

I installed the newest version, but it is giving me an error, same as the older installation (both attempts are here below:

[copettid@kp141-242 tigmint]$ tigmint-make
Tigmint: Correct misassemblies using linked reads
Usage: tigmint-make [COMMAND]... [PARAMETER=VALUE]...
Example: tigmint-make tigmint draft=myassembly reads=myreads
For more information see https://bcgsc.github.io/tigmint/
[copettid@kp141-242 tigmint]$ tigmint-make tigmint
make: *** No rule to make target 'draft.reads.as0.65.nm5.molecule.tsv', needed by 'tigmint'.  Stop.
[copettid@kp141-242 tigmint]$ ~/bin/old_tigmint/tigmint-make
Tigmint: Correct misassemblies using linked reads
Usage: tigmint-make [COMMAND]... [PARAMETER=VALUE]...
Example: tigmint-make tigmint draft=myassembly reads=myreads
For more information see https://bcgsc.github.io/tigmint/
[copettid@kp141-242 tigmint]$ ~/bin/old_tigmint/tigmint-make tigmint
make: *** No rule to make target 'draft.reads.as0.65.nm5.molecule.tsv', needed by 'tigmint'.  Stop.
[copettid@kp141-242 tigmint]$ tigmint-make tigmint raft=Rabiosa_genome reads=reads
make: *** No rule to make target 'draft.reads.as0.65.nm5.molecule.tsv', needed by 'tigmint'.  Stop.

Also adding --dry-run gives me the same error. The folder looks like this:

-rwx------ 1 copettid mpb 4.3G Feb 20 08:44 Rabiosa_genome.fa
-rw-r--r-- 1 copettid mpb 4.3G Feb 21 10:00 Rabiosa_genome.fa.bwt
-rw-r--r-- 1 copettid mpb 1.1G Feb 21 10:01 Rabiosa_genome.fa.pac
-rw-r--r-- 1 copettid mpb 8.9M Feb 21 10:01 Rabiosa_genome.fa.ann
-rw-r--r-- 1 copettid mpb 2.6M Feb 21 10:01 Rabiosa_genome.fa.amb
-rw-r--r-- 1 copettid mpb 2.2G Feb 21 10:22 Rabiosa_genome.fa.sa
lrwxrwxrwx 1 copettid mpb  103 Mar  2 08:53 input_fastqs -> '/home/copettid/public/Molecular Plant Breeding Group/Dario/Lolium/rabiosa_raw_data/10x_raw_data/fastqs/'
-rw-r--r-- 1 copettid mpb  344 Mar  5 10:08 transfer
lrwxrwxrwx 1 copettid mpb  144 Mar  5 10:11 reads.fq.gz -> '/home/copettid/public/Molecular Plant Breeding Group/Dario/Lolium/rabiosa_raw_data/10x_raw_data/longranger_basic_output/4606/outs/barcoded.fq.gz'
-rwx------ 1 copettid mpb 180G Mar 17 10:09 Rabiosa_genome.reads.sortbx.bam
-rw-r--r-- 1 copettid mpb 8.4M Mar 28 09:39 Rabiosa_genome.fa.fai

I am not sure what else I should do. Thanks,

Dario

lcoombe commented 6 years ago

Looks like you misspelled draft here? [copettid@kp141-242 tigmint]$ tigmint-make tigmint raft=Rabiosa_genome reads=reads

dcopetti commented 6 years ago

Dang! That is stupid.

With the --dry-run looks like this:

tigmint-make tigmint draft=Rabiosa_genome reads=reads --dry-run
/home/copettid/bin/tigmint/bin/tigmint-molecule -b Rabiosa_genome.reads.sortbx.bam -o Rabiosa_genome.reads.as0.65.nm5.molecule.tsv -a 0.65 -n 5 -q 0 -d 50000
awk 'NR>1 { print $1"\t"$2-1"\t"$3-1"\tReads="$7",Size="$4",Mapq="$8",AS="$9",NM="$10",BX="$5",MI="$6"\t"$7 }' Rabiosa_genome.reads.as0.65.nm5.molecule.tsv \
        | sort -k1,1 -k2,2n -k3,3n >Rabiosa_genome.reads.as0.65.nm5.molecule.bed
awk '$3 - $2 >= 2000' Rabiosa_genome.reads.as0.65.nm5.molecule.bed >Rabiosa_genome.reads.as0.65.nm5.molecule.size2000.bed
/home/copettid/bin/tigmint/bin/tigmint-cut -p8 -w1000 -n20 -t0 -o Rabiosa_genome.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa Rabiosa_genome.fa Rabiosa_genome.reads.as0.65.nm5.molecule.size2000.bed
ln -sf Rabiosa_genome.reads.as0.65.nm5.molecule.size2000.trim0.window1000.span20.breaktigs.fa Rabiosa_genome.tigmint.fa

Then I run it and I get this error:

tigmint-make tigmint draft=Rabiosa_genome reads=reads
/home/copettid/bin/tigmint/bin/tigmint-molecule -b Rabiosa_genome.reads.sortbx.bam -o Rabiosa_genome.reads.as0.65.nm5.molecule.tsv -a 0.65 -n 5 -q 0 -d 50000
Traceback (most recent call last):
  File "/home/copettid/bin/tigmint/bin/tigmint-molecule", line 16, in <module>
    import pysam
ImportError: No module named 'pysam'
/home/copettid/bin/tigmint/bin/tigmint-make:174: recipe for target 'Rabiosa_genome.reads.as0.65.nm5.molecule.tsv' failed
make: *** [Rabiosa_genome.reads.as0.65.nm5.molecule.tsv] Error 1

so I updated pysam:

conda install pysam
Fetching package metadata .................
Solving package specifications: .
Package plan for installation in environment /home/copettid/miniconda2:
The following NEW packages will be INSTALLED:
    bcftools: 1.7-0        bioconda
The following packages will be UPDATED:
    pysam:    0.8.4-py27_0 bioconda --> 0.14.1-py27_htslib1.7_0 bioconda
[...]

move to a new terminal and I get this:

tigmint-make tigmint draft=Rabiosa_genome reads=reads
make: *** No rule to make target 'reads.fq.gz', needed by 'Rabiosa_genome.reads.sortbx.bam'.  Stop.

as if it is not recognizing the bam file?

lcoombe commented 6 years ago

Just checking -- are you in the same working directory as before? It looks like it can't see the file named reads.fq.gz?

dcopetti commented 6 years ago

You are right, the links were not activated. But the error with pysam persists:

/home/copettid/bin/tigmint/bin/tigmint-molecule -b Rabiosa_genome.reads.sortbx.bam -o Rabiosa_genome.reads.as0.65.nm5.molecule.tsv -a 0.65 -n 5 -q 0 -d 50000
Traceback (most recent call last):
  File "/home/copettid/bin/tigmint/bin/tigmint-molecule", line 16, in <module>
    import pysam
ImportError: No module named 'pysam'
/home/copettid/bin/tigmint/bin/tigmint-make:174: recipe for target 'Rabiosa_genome.reads.as0.65.nm5.molecule.tsv' failed

should I maybe redownlaod tigmint after having updated pysam?

dcopetti commented 6 years ago

I just downloaded again tigmint, same error

lcoombe commented 6 years ago

Can you double check that the python3 that you want to use is being found (ie. in your PATH)? Ex: when you use which python3, you see the path to your desired miniconda installation?

dcopetti commented 6 years ago

Nope, it is the one in /usr/bin/python3. I will install it tomorrow with my colleague, thanks for the help!

lcoombe commented 6 years ago

No problem! Let me know if you encounter any other issues.

dcopetti commented 6 years ago

Hi, After installing python3.5 and updating the dependencies, I am now able to run tigmint as tigmint tigmint draft=Rabiosa_genome reads=reads span=4 nm=4 dist=20000 t=8 Regarding the parameters, are the default already stringent? My goal is to break scaffolds with low (or questionable) 10x support. In parallel, I am now running longranger wgs to visualize the linked reads around my putative junctions. Will ~12x coverage be enough? Or else I can use all the reads (~25x), but it will take longer. Thanks for the support, very nice of you!

bcgsc / tigmint

write temporary bam files #8