Closed cappadoc closed 1 year ago
Hi! Do you have an example of what the output of samtools fasta -@ 2 -n NA12878.rmdup.bam
looks like?
It looks like the following:
>A00217:76:HFLT3DSXX:4:2608:14073:14810 AGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGATATCGGAAGAGCTTCGTGTAGGGAAAGAGTGTTAATTAGTGTGTAGTTATCGGATTTCGAAATATAAATAAAAAAATTAAATAAAAGAGAAAAAGTAATTATA >A00296:43:HCLHLDSXX:3:1140:23457:10520 AGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTAAGGGTTAGGGTTAGGGTTATCGGAAGAGAACACGTAAGAACTCCAGTAAGTTGCCTGGATCGGGTAAGCGGTCTGGTGCGGGTGAGGGGTGGGGGT >A00217:77:HFJWFDSXX:3:1215:18774:33411 GGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGATAGGGATATCGGAAGAGAAAACTAAAGAAATCCAGTAACTTTACTAGATATCGTAATGCGTCTTGTGCTTGTAATGGGTGGGGGGGGGGGGGG >A00217:77:HFJWFDSXX:3:1358:27046:24471 AGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGATATCGGAAGAGAAAACGACAGAAATCCAGTCAAATTACTAGATCGCGTAAGACGACTTCTGCTAGAAAAGGGTGGGGGGGGGGGGGGGGGG >A00217:77:HFJWFDSXX:4:2669:5828:29371 AGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTAGGGTTATCGGAAGAGAAATCTTAAGAGAAACTGTCAGGGTACTAGATAGCGTAAGGCTTCTTGTGAGTGAT >A00217:77:HFJWFDSXX:3:2201:16233:34992 GGGTTAGGGTTAGGGGTAGGGTTAGGGTTAGGGTTATCGGAAGTGATCACGTCAGAAGTACAGTAACTTTGCTTGTTCGGGTTAGGGGTCTTGTAGGGTAAGGGTAGGGGGTGGGGTCGGGGTAGGGTGGGGGGTGGGGGTGGGGGTGGG . . The bam has 675776582 reads in total. I got a successful result when using the sample bam file (HG00514.filtered.reads.bam) contained in the danbing-tk package.
The reads look fine to me at first glance. Just checking -- is this sample paired-end sequencing? The read names of a read pair have to be the same for danbing-tk to map paired-end reads.
Yes, this data is paired-end reads with the same name for each pair, and the bam is sorted by coordinate. For example, reads of a pair are output as follows:
>A00296:43:HCLHLDSXX:3:2254:22779:3521 CTTGCTTACTGTATAGTGGTGGCACGCCGCCTGCTGGCAGCTAGGGACATTGCAGGGTCCTCTTGCTCAAGGTGTAGTGGCAGCACGCCCACCTGCTGGCAGCTGGGGACACTGCCGGGCCCTCTTGCTCCAACAGTACTGGCGGATTAT . . >A00296:43:HCLHLDSXX:3:2254:22779:3521 CTCCAGGGCTTCACCTGTTAGACTGGCAAAAATCCAAAAGTAAACACTTTGTGGAGAAACAGGCATTCCTAGACATTGCTGGTGGGATACAGAACAGTACAATTCTGATGGTAATCAGTTCACAAATTAAACATATTTATTTTTTACTTT
Hi, sorry for the huge delay on following up. Have you figured out the reason why the mapping is not working? Let me know if I can help with debugging. Feel free to drop a test bam file here. -Tony
I reinstalled the latest version and run the job. The NA12878.tr.kmers and NA12878.aln.gz files seemed to be normally created. Thank you for your help.
I conducted danbing-tk align with my short read-based bam file, which had been aligned against the GRCh37 reference using bwa mem. The command, in the working directory containing the presence of the RPGG pan files, completed without any errors, but the generated kmers output file (NA12878.tr.kmers) was empty and the file size of aln.gz file was only 1.3 MB. Is there any solution for this trouble?
My command and output on the console are the follows:
samtools fasta -@ 2 -n NA12878.rmdup.bam | danbing-tk -gc 80 -ae -kf 4 1 -cth 45 -o NA12878 -k 21 -qs pan -fai /dev/stdin -p 6 | gzip > NA12878.aln.gz
use baitDB: 0 extract fasta: 0 interleaved: 1 sim mode: 0 trim mode: 0 augmentation mode: 0 graph threading mode: 1 output alignment: 1 output successfully aligned reads only: 1 k: 21
of subsampled kmers in pre-filtering: 4
minimal # of matches in pre-filtering: 1 Cthreshold: 45 Rthreshold: 0.5 threading Cthreshold: 80 Running both step1 (kmer-based filtering) and step2 (threading) fastx: /dev/stdin query: pan.(tr/ntr).kmers
total number of loci in pan.tr.kmers: 80518 deserializing kmerDBi.umap deserializing kmerDBi.vv deserializing graph.umap reading kmers from pan.tr.kmers deserialized graph/index and read tr.kmers in 127 sec.
unique kmers in kmerDBi: 147568763
creating data for each process... threads created Buffered reading 300000 300000 0 Buffered reading 300000 600000 0 Buffered reading 300000 900000 0 Buffered reading 300000 1200000 0 Buffered reading 300000 1500000 0 Buffered reading 300000 1800000 0
Thank you for your help.