PacificBiosciences / pbbioconda

PacBio Secondary Analysis Tools on Bioconda. Contains list of PacBio packages available via conda.
BSD 3-Clause Clear License
251 stars 45 forks source link

[pbccs/pbtk] Simulating low coverage hifi read for pipeline validation #714

Closed lok27395 closed 1 month ago

lok27395 commented 1 month ago

Hi all,

I am currently working on simulating low coverage hifi reads to test my pipeline's performance on low coverage samples.

By using: pbsim3 - simulate WGS 5X coverage 10 passes raw .sam from .fasta samtools - convert to .bam pbtk - merge multiple .bam into one pbccs - conver .bam into (ccs).bam pbtk - extract hifi from (ccs).bam

However, when I call variant using pbsv on both ccs.bam and hifi.bam; warning of detected ccs input are prompted in both .bam Is there anything I have done wrongly? (it seemed the size of .ccs and .hifi are the same though)

Code used pbsim --strategy wgs --method errhmm --errhmm ~/miniconda3/pkgs/pbsim3-3.0.4-h4ac6f70_0/data/ERRHMM-SEQUEL.model --depth 5 --genome ~/HG002_BCM/hg002v1.1.fasta --pass-num 10 ccs 5X10P.bam 5X10P.ccs.bam extracthifi 5X10P.ccs.bam 5X10P.hifi.bam

armintoepfer commented 1 month ago

I really do not recommend using any simulator nowadays. There are plenty of HiFi datasets available for testing https://www.pacb.com/connect/datasets/