bcgsc / tigmint

⛓ Correct misassemblies using linked AND long reads
https://bcgsc.github.io/tigmint/
GNU General Public License v3.0
54 stars 13 forks source link

tigmint-molecule #29

Closed francicco closed 5 years ago

francicco commented 5 years ago

Hi,

I'm testing tigmint following your instruction. So far I executed the following commands:

bwa index $ASSEMBLY.fasta
bwa mem -t$THREADS -C $ASSEMBLY.fasta $R1 $R2 | samtools sort -@8 -tBX -o $ALIGNMENT.bam
tigmint-molecule $ALIGNMENT.sam | sort -k1,1 -k2,2n -k3,3n > $ALIGNMENT.bed

I also tried to convert the bam file into a sam and execute tigmint-molecule again. It always gave me a syntax error:

File "/home/fc464/software/tigmint/bin/tigmint-molecule", line 46
sep="\t", file=file)
       ^
SyntaxError: invalid syntax

This is how my fastq file are formatted:

@D00352:461:CCYW4ANXX:2:1101:16223:19699_AAAAAAAAACCAGAAA
TAACTAAGAATTCGAAAGAAGATTCGAACTCGCGCCTCCTGAATACCGTCCGGGCGCTCTCACCACTAAGCCATGCGTTCTACTACAAGCTGCGTCGAAATT
+
BFB<<FF<F/F///<FF/FF/<<<//</</<B<<B/BF<F<F/<<FF////F///7<BF</<B7BF//<BFF<B//7B<B/7B/B/BB/B/BF/////B7F<
@D00352:461:CCYW4ANXX:3:1205:1151:93622_AAAAAAAAAGTACCAA
GAAAAACAAAAAAAAAACGAACTACTTTTATCAGATAACATTGTTCTTTGAGCACATTTACAAAATAGCGATTTCATTTCAAAAAAACTAATAATTCATTGT
+
BFBF/FFFFBFFFFFFF<//<B/FF/<F/F<////BFFFFB//</<FFFF<<BFBFFFBFFFFFFFFFB//B/7/FB/<//7BFBBBB7FB<FFFFB7B/77
@D00352:461:CCYW4ANXX:4:2307:1605:2891_AAAAAAAACAATTCCA
ATAGAGGATGAAAGTGGCAGTTCACGTGGCGAAGCCGCGAGCGGGTGGCTAGTAAACAATAAGCGAGTTATCTCACCAGGTAAAATGTTGAAACATGATTCA
+
FFFFFFFFFFFFFFFBFBF<<F/FFB/F/FFFFFFFBFFFFFFF<<BBFFFFFFFFFFFFFFFFFBFBFFFFFFBFFFFFFFFFFFBFFBFFFFFBB/FFFF

the same way ARCS takes the reads.

What am I doing wrong? is there a problem with my pysam?

Thanks F

lcoombe commented 5 years ago

Hi @francicco,

For Tigmint, the barcodes are expected to be in the BX:Z: tag of the read headers - in the format that longranger basic produces. If you want to use the tigmint-make Makefile, the reads also need to be in a single, interleaved file (again, the default output from longranger basic) - I recommend running the pipeline using the Makefile vs. running each command separately, which can be a bit more error-prone. Just FYI - ARCS can also now also use the barcode information from the BX:Z: tags of the read alignments -- take a look at the arcs-make Makefile for more details, or feel free to ask follow-up questions in that repository.

As for the error that you're seeing -- what version of python are you using? What version of tigmint are you using?

Lauren

francicco commented 5 years ago

My version of Python is the 2.7.5. The version of tigmint is the latest I guess, I cloned it saturday. This is the way I can convert barcoded_unaligned.bam read. Is that correct?

@A00618:19:HHCTMDMXX:2:1351:5520:13401/1       RX:Z:AAAAAAAAAAAAAGGA
AGAGTGGGTAAGATTTATTTTTAAAAAGTATTTATATAGTTTTTGTGAGAAATTTTTTAGTAGTTTTTAGGTTTGGGATGAGATGAGTGAAGATGAGAAGAATAGGATAATATTTAGGTATATAAGA
+
FFFF,FFFF,FF:FFF:FFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFF:FFFF:FFFFFFFFFFFFFF,,F,FF:::FFFFFFF,FFFFF::F:FFFFFFFFF,F,F:FF,FFFFF:F

Thanks F

lcoombe commented 5 years ago

Hi @francicco,

Tigmint requires python3 - so try making sure a python3+ version is on your path (with the required modules installed).

As for the reads, do you have the output from longranger basic still? If so, using that interleaved file would be the easiest for you. If not, the barcode needs to be in a BX:Z: tag of the read header (not RX:Z: tag): Ex:

@E00247:267:HMVT3CCXX:1:1120:14945:59200 BX:Z:AAACACCAGACAATAC-1
CAAGGCATTCTGGGCTCAGGCATTCTTGTGGTAGGCATTCTACCACAAGGCATTCTGGGCTCAGGCATTCTTGTGGTAGGCATGCTGCCACAAGGCATTCAGGACTCAGGCATTCTTGTGGTAGGTAG
+
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK7A<<AFFKA,77<F<KF<F<7<,7<F7,,,AK<K7KK<A,AF,AFA,FF,A<,7,,,,,7A<A,,,,,,,A,,,<AFAFAFKAFAKA,,<7A,,
@E00247:267:HMVT3CCXX:1:1120:14945:59200 BX:Z:AAACACCAGACAATAC-1
ACCACAAGAATGCCTGAGCCCAGAATGCCTTGTGGTAGAATGTCTACCAGAAGATAGATTGGGAGAACGACGCGTTGGGGTAGAGTGTAGACCGCGGTGATCGCCGGATCATGAACGAATTGGGGTAGAGTGTAGACCGGGGTGGTCGGCG
+
AAAFFKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK<KA,7,7F<,,,,,,,,A,,,A,,,,,,,,,,A(,F,A,A,,AKKFKKK,,<,7,,(77,,A,AAAA((,,,,,,A,,,,,,,,AA,AFKFF7,,,,,,,((,,<(,,,(,<

Hope that helps! Lauren

francicco commented 5 years ago

From longranger basic I get the bam. Is there a quick way to covert it into a fastq?

Thanks D

francicco commented 5 years ago

What about the number at the end of the barcode? BX:Z:AAACACCAGACAATAC-1 I don't have it F

lcoombe commented 5 years ago

Oh I see -- So you used the --bam option with longranger basic then? We output a fastq file here.

Do you have abyss installed? You can use abyss-tofastq --bx to convert the BAM to a fastq file with the BX tags in the header if so. Otherwise I think that samtools fastq -TBX should work as well.

The number at the end of the barcode is the 'read group'. It can be used to distinguish different chromium libraries.

francicco commented 5 years ago

I'm using samtools, but for some reason in the bam I have the other tag. I'll pipe into sed substitution. I have a single library, I guess I'll skip the number, right?

Very helpful, thanks F

On Thu, 6 Jun 2019, 17:16 Lauren Coombe, notifications@github.com wrote:

Oh I see -- So you used the --bam option then? We output a fastq file here.

Do you have abyss installed? You can use abyss-tofastq --bx to convert the BAM to a fastq file with the BX tags in the header if so. I think that samtools fastq -TBX should work as well.

The number at the end of the barcode is the 'read group'. It can be used to distinguish different chromium libraries.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bcgsc/tigmint/issues/29?email_source=notifications&email_token=ACEW6FT75UFPWZHERESMF2DPZEZ5VA5CNFSM4HTTLXTKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXDMEPY#issuecomment-499565119, or mute the thread https://github.com/notifications/unsubscribe-auth/ACEW6FX4PH2XJ6NIWGXXQ2DPZEZ5VANCNFSM4HTTLXTA .

lcoombe commented 5 years ago

Hi @francicco,

Sounds good! Interesting that you don't have both the RX and BX tags -- one is the raw barcode, while the other is the processed (corrected) barcode, and the few chromium BAMs I've looked through do have both of those. This page lists all the tags you should be seeing: https://support.10xgenomics.com/genome-exome/software/pipelines/latest/output/bam It is recommended that you use the BX tag (which has error correction and was checked against the barcode white list) for your analysis.

As for the number, yes you are OK without the read group number if you only have one library.

Glad I can help! Lauren

francicco commented 5 years ago

Hi Lauren,

using tigmint-make I get this error at the end of the alignment

gzclose] buffer error
[bam_sort_core] merging from 32 files and 32 in-memory blocks...
make: *** [draft.reads.sortbx.bam] Error 1
make: *** Deleting file `draft.reads.sortbx.bam'

Now I'm trying running each command separately.

F

francicco commented 5 years ago

Ok, works!!!

lcoombe commented 5 years ago

Glad you got it working!