bcgsc / tigmint

⛓ Correct misassemblies using linked AND long reads
https://bcgsc.github.io/tigmint/
GNU General Public License v3.0
54 stars 13 forks source link

tigmint-molecule silent #32

Closed francicco closed 4 years ago

francicco commented 4 years ago

Hi @lcoombe,

I changed cluster and I had to reinstall the software. Now I'm testing if the new environment works. What I did was to map mu linked reads to the assembly using

bwa mem -t$THREADS -C $ASSEMBLY.fasta -p $READS | samtools sort -@$THREADS -tBX -o $ASSEMBLY.LinkedReads.sortbx.bam The fist possible problem is that bwa mem doesn't recognise the paired reads, and I think maps them as single ends

###### Mon Oct 14 14:54:33 BST 2019: Mapping linked-reads with BWA onto AvanCR.Tigmint.test
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 722022 sequences (100000047 bp)...
[M::process] 722022 single-end sequences; 0 paired-end sequences

My reads headers are like this:

@A00618:19:HHCTMDMXX:2:1445:7048:35712/1_AAAAAAAAAAAAAGGA
TAAAGAAAATTGGAGGGTACGGTATCAATCTCGTTAGACTTTTAAGATTTATAGGAAAAGAATTGAAGAAGTTGAAGATAAATTAGGAAAAGGACCTGTTATAGGACATTGAAAGGGTATTAGATCG
+
FFFF,,,F,FFF,,,,FF,F:,:F:F,:,,,,FF,F,,,F:FF,,F,,,,FFFF,,,:F:F:FFF::FF:F:::FF,:FFFFFFFF,:,:,:F:,,,F,F,FFFF:FF,FFF,F,FFF:FF:FFFF:
@A00618:19:HHCTMDMXX:2:1445:7048:35712/2_AAAAAAAAAAAAAGGA
AATACAATTTAAATTAACTATAACAATTCCAATTCCTAATATATAGAAAACTTCATCAATTAAAAAACTATAAAACAAAAAATTCAAAAAAAAATTATAAACAACAACAAAATTTTCTATCGATCATATCCTTTTAATAAATTTAAAACA
+
,,F:F,,:,FF::,,::,,:FF:F,:F,F,FF:,::,FF,F,FF,,,,:F,::::,,::,:,,:,,FFF::::FF,F:F:::F,:,F,:,F,FFF,:FF:,,,::FF,F,FF,FFFF,,::F,F,F:,F,:FF,,,,F,,,,:,F,,F,,
@A00618:19:HHCTMDMXX:2:2459:29776:36526/1_AAAAAAAAAAAAAGGA
GAAAAGAAATAAGTTGGGTTTGATTATTTTATTTTTTGATTTTTGTTTATTATATGGTTATGGTTAAATTATTTTTTTTAATTTTTATTTTTTTATTTGTAAAAGAAAATATTTTTTGATATTATGT
+
FFFFFFFF:FF:FF:FFF:F:FF:FFFFFFFF,FF,F,FF::FFFF,FFFFFFF:FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFF:FFFFFF:FF:F:FF:,::F,FF
@A00618:19:HHCTMDMXX:2:2459:29776:36526/2_AAAAAAAAAAAAAGGA
CCTAAAAAAATAACTAAAAAATCTAAAAATAATATATATTTTCTATTATAAAATTCTACGTAAAATAACAACTACTAATATAACCTCTTTATAATTTTCAATACACTTCAACAAACATTACCCCATTCAATCCTCACAACAACCCTACAT
+
:FFFFFFFFFFFFFFFFFFFFF,F,FF,FFFFF:F:FFFFFFFF,FFFFF:F:FFFF:FFF:FF:F:FFFFF,FFF:FFFFF:FFFFFF,F:F,::FFFFFFF,FF,F:,FFFFFFFFFFFFFFFFFFF:FF,FFFFF,F:FFFFF,FF:
@A00618:19:HHCTMDMXX:2:2266:17074:12493/1_AAAAAAAAAAAAATTC
ATTGTTATTATAATAGTATAAAATAATAATATTAATAGTCAAATATTTTATAATAAAATAAACAATGAATTATTAGTGAAAAAAAAAATATAATTGTTATTAAATTAAAATAAAAAATATATTAAAA
+
FFFF:,:FFFF:,FFF,:F,F,F:F,,:FF,FFF:F,FF,F,FFFFFFFF,,F,FF,,,,FF,:,FFF:FF,F,:,FF,F,FFF,F,FFFFFF,F,FF:,FFF,FF:FF::,FFFF,:F:::,FFF,
@A00618:19:HHCTMDMXX:2:2266:17074:12493/2_AAAAAAAAAAAAATTC
CTCTTTAAAAAATTTTCTATCTACAACCATAAAAATTCCAAAATTAATCAATATAACTACTATTTATATAAACAAATCTTTCTAACGCTAAATAATCACTTCACACACTAATCACACTACACACCACACTAATATCACATATTATAAAAC
+
F:F,::FF:F,FF:,F,,FF,FF,,FFF:,FF:FF,,F:F,FFFFFFF,,:,F,:F:,F:FFFFFFFFFF,,FFF,F,F:F:,FFF:,:,,,,,FFF:F:FF,FFF,F,F,,,,FF:,,:,FF,FFF:F:F,FFFF:::FFF,FF::,,:

Maybe the /1 and /2 is the problem.

But I also tried:

bwa mem -t$THREADS -C $ASSEMBLY.fasta $READ1 $READ2 | samtools sort -@$THREADS -tBX -o $ASSEMBLY.LinkedReads.sortbx.bam and the mapping seems fine.

After that I execute:

tigmint-molecule $ASSEMBLY.LinkedReads.sortbx.bam | sort -k1,1 -k2,2n -k3,3n

bit everything is silent. What am I doing wrong?

lcoombe commented 4 years ago

Hi @francicco,

Sorry for the delay in getting back to you -- I've been on vacation for the past couple of weeks.

First of all, tigmint expects the barcode to be in the BX:Z tag of your reads -- it looks like you have your reads formatted how older versions of ARCS required the barcode? If you use the reads output from longranger basic, then the barcode will be in the tag where it is expected.

In addition, as you suspected, I think the naming of your reads are preventing the aligner from recognizing that they are paired end with -p. The /1 and /2 suffixes are fine when they are at the end of the read header, but the _<barcode> addition is interfering. I bet that when you specify the R1/R2 separately, it doesn't have to do the smart pairing logic, thus it works better.

Basically, I think if you use the interleaved output file from longranger basic with the barcode in the BX:Z tag, that should solve your issues.

Hope that helps! Lauren

francicco commented 4 years ago

Hi Lauren,

Thanks a lot, I solved this some time ago! :) I hope you enjoyed your vacation. Best, F

lcoombe commented 4 years ago

Glad to hear you got it sorted out! :)