PacificBiosciences / kineticsTools

Tools for detecting DNA modifications from single molecule, real-time sequencing data
19 stars 21 forks source link

ipdSummary command stops working if --identify m5C_TET is included in command, and does not give any m5C info either #97

Open Marjan-Hosseini opened 8 months ago

Marjan-Hosseini commented 8 months ago

I have installed the latest SMRT Tools (ipdSummary version 3.0). My goal is to extract ipdRatio and modified bases. In the ipdSummary --help command I see this instruction:

--identify IDENTIFY   Specific modifications to identify (comma-separated list). Current options are m6A, m4C, m5C_TET. Using --control overrides this option. (default: m6A,m4C)

Accordingly, I am running this command:

ipdSummary sorted_chr21_000.bam --reference $REF_path/$REF_FILE_NAME.fasta --referenceWindow 20:1-200000 --identify m6A,m4C,m5C_TET -j 20 --methylFraction --outfile $save_path/$chrom/$name

My bam file is sorted and relatively small (<200 MB). But this command never finishes and does not throw any errors. In different stages of working it just stops. I checked different thread numbers and smaller files, but still, the command doesn't work. However, once I changed 'm5C_TET' to 'm5C' it worked perfectly fine and very fast on the same bam file, but in the output, I could not see any information on 'm5C' modifications. I checked the new command with many other bam files and still the same.

My question is the following: How can I get at least 'm5C' modifications? Should I change my SMRT Tools version?

natechols commented 8 months ago

I think that was an experimental option that was never fully working or validated, and it no longer works. We probably need to update the help text. Use the separate tool jasmine if you want to do m5C calling.