ISUgenomics / SequelTools

new repo
GNU General Public License v3.0
26 stars 6 forks source link

QC analysis no scrap mode failed #17

Open JiaweiShen1116 opened 2 years ago

JiaweiShen1116 commented 2 years ago

Hi,

I am currently using the SequelTools to perform some QC analysis on some of the PacBio subread sequence data. However, it reports the error message as following:

Beginning quality control function

Running in NO_SCRAPS mode [E::bgzf_uncompress] Inflate operation failed: 3 [E::bgzf_read] Read block operation failed with error 1 after 0 of 4 bytes [main_samview] truncated file. SequelTools.sh: line 676: generateReadLenStats_noScraps.py: command not found ERROR: Calculation of read length statistics failed!

I have seen other people reporting similar issues with python script not found and I have already added python to my $PATH variable. So I am wondering what would be the possible cause for this error message. Thank you very much.

Best, Jiawei Shen

aseetharam commented 2 years ago

Hi @JiaweiShen1116! From the message, it looks like your bam file is truncated (or corrupted). Can you check the last line of your bam file to see it has all fields (and looks complete)? You can also use samtools quickcheck as well.

Thanks,

JiaweiShen1116 commented 2 years ago

Hi @JiaweiShen1116! From the message, it looks like your bam file is truncated (or corrupted). Can you check the last line of your bam file to see it has all fields (and looks complete)? You can also use samtools quickcheck as well.

Thanks,

Hi, I performed the samtools quickcheck function on my Pacbio subread bam files and it indicated that all of them do not have targets in the header. I think this could be due to the fact that they are still unaligned, or could it be other reasons? Is there anything I could do about it? Thank you very much.

Jiawei Shen

aseetharam commented 2 years ago

@JiaweiShen1116: are you sure you are using the subreads bam file as produced by the sequel? If you have converted the fastq files to bam, SequelTools will not work as it relies on the specific fields to compute stats. It does not need aligned bam files for computing stats.