BAM input for PacBio data

ZekunYin / RabbitQC

The new version is available at https://github.com/RabbitBio/RabbitQCPlus

GNU General Public License v3.0

26 stars 4 forks source link

BAM input for PacBio data #4

Open splaisan opened 3 years ago

splaisan commented 3 years ago

Hi PacBio data is saved in BAM format and not in FastQ. Is BAM supported and if not could it please be considered. Converting 100's of GB BAM files to FastQ takes time and space and running directly on the PB-BAM would be real nice. Please note that PacBio has both raw-BAM without arbitrary quality scores and CCS-BAM (HiFi; with quality scores) Thanks in advance

ZekunYin commented 3 years ago

We are working on it now! At least we are going to support raw-bam files. I still discuss with our team and see if we can support CCS-BAM. Currently, I'm not sure whether we should support CCS-BAM in RabbitQC or we just write a new tool to support. Anyway, we will release a new version soon. And thanks for you advice. Best, Zekun

ZekunYin commented 3 years ago

Hi, Now we are working on the CCS(hifi) read saved in bam format. But I'm not sure whether you are interested in the quality control for raw sequel II data (see this paper or the polished data (pacbio provides a tool called ccs to generate the hifi reads). I hope you could provide more details of your need. Best, Zekun

splaisan commented 3 years ago

Hi, We do not yet have the sequel 2 so I cannot comment on this one. For Sequel 1, I agree that polymerase reads and subreads are not that interesting to QC since their qualities are not real. Still, size distribution of the raw polymerase reads and of the subreads as well as the distribution of subreead counts per polymerase reads would be really nice to have if this is something you can implement. Thanks for the tool

ZekunYin commented 3 years ago

Got it. I think plotting the length distribution is not that hard. I will see what we can do. Besides, we plan to implement a new tool for the preprcessing or quality control of BAM-based files (both aligned or unaligned). Anyway, we will try to implement the length distribution at first. I will let you know, When the beta version is ready.