Closed msnoon closed 1 year ago
Hi @msnoon:
It looks like you are using SequelTools on CCS reads. Unfortunately, it only works on subreads or CLR, but not on CCS. It also works on scrap files of either subread/CLR as well.
Thanks,
we dont have CLR or scrap files, all we got is ccs file. do you know if there are any tools that could take ccs as input?? or how do I get CLR files??
@msnoon: IMHO, there is no need to QC the CCS reads. They are already processed, meaning if the base quality was poor or did not meet certain standards, they are excluded from generating the CCS reads.
If you want to calculate some stats regarding the length distribution and/or total bases etc, you could use seqkit stats
, once you convert your CCS reads to fasta/fastq format (using samtools fasta
)
samtools fasta --threads 16 input_CCS.bam > output.fasta
seqkit stats *.fasta -a
Example output:
file format type num_seqs sum_len min_len avg_len max_len Q1 Q2 Q3 sum_gap N50 Q20(%) Q30(%)
hairpin.fa.gz FASTA RNA 28,645 2,949,871 39 103 2,354 76 91 111 0 101 0 0
mature.fa.gz FASTA RNA 35,828 781,222 15 21.8 34 21 22 22 0 22 0 0
Illimina1.8.fq.gz FASTQ DNA 10,000 1,500,000 150 150 150 150 150 150 0 150 96.16 89.71
reads_1.fq.gz FASTQ DNA 2,500 567,516 226 227 229 227 227 227 0 227 91.24 86.62
reads_2.fq.gz FASTQ DNA 2,500 560,002 223 224 225 224 224 224 0 224 91.06 87.66
Hope this helps!
Thanks,
Thank you, Arun!!
Hi David, I am getting a similar error and I am sure all required tools are on the path. Could you help resolve this??
SequelTools.sh -t Q -v -u subFiles.txt Beginning quality control function
Running in NO_SCRAPS mode Extracting data from .bam files Data extraction was sucessful Beginning calculation of read length statistics Traceback (most recent call last): File "/hdd_scratch1/msn/tools/SequelTools/Scripts/generateReadLenStats_noScraps.py", line 94, in start = int(coord.split("")[0]); stop = int(coord.split("")[1]) ValueError: invalid literal for int() with base 10: 'ccs' ERROR: Calculation of read length statistics failed!