ablab / IsoQuant

Transcript discovery and quantification with long RNA reads (Nanopores and PacBio)
https://ablab.github.io/IsoQuant/
Other
144 stars 13 forks source link

isoquant log file #235

Closed KMbio43 closed 2 weeks ago

KMbio43 commented 3 weeks ago

Hello, thanks for developing this tool, it has been really useful and much easier to get up and running than the other tools we tried! The file I entered, pacbio ccs.fastq, has 2332878 reads, and when I run isoquant.py, the output log file looks like this: 2024-09-05 15:14:05,795 - INFO - Multimappers resolved 2024-09-05 15:14:05,802 - INFO - Alignments collected, overall alignment statistics: 2024-09-05 15:14:05,802 - INFO - primary: 2179504 2024-09-05 15:14:05,802 - INFO - secondary: 1578870 2024-09-05 15:14:05,802 - INFO - supplementary: 38101 2024-09-05 15:14:05,802 - INFO - unaligned: 153374 and another section: 2024-09-05 15:16:43,753 - INFO - Transcript model statistics 2024-09-05 15:16:43,753 - INFO - known: 14694 2024-09-05 15:16:43,753 - INFO - novel_in_catalog: 3280 2024-09-05 15:16:43,753 - INFO - novel_not_in_catalog: 5238 2024-09-05 15:17:15,144 - INFO - Gene counts are stored in ./isoquantResult/OUT/OUT.gene_counts.tsv 2024-09-05 15:17:15,144 - INFO - Transcript counts are stored in ./isoquantResult/OUT/OUT.transcript_counts.tsv 2024-09-05 15:17:15,144 - INFO - Read assignments are stored in ./isoquantResult/OUT/OUT.read_assignments.tsv.gz 2024-09-05 15:17:15,144 - INFO - Read assignment statistics 2024-09-05 15:17:15,144 - INFO - ambiguous: 378303 2024-09-05 15:17:15,144 - INFO - inconsistent: 150158 2024-09-05 15:17:15,144 - INFO - inconsistent_ambiguous: 79555 2024-09-05 15:17:15,144 - INFO - inconsistent_non_intronic: 462985 2024-09-05 15:17:15,144 - INFO - intergenic: 5921 2024-09-05 15:17:15,144 - INFO - noninformative: 17741 2024-09-05 15:17:15,144 - INFO - unique: 1051702 2024-09-05 15:17:15,145 - INFO - unique_minor_difference: 159796 2024-09-05 15:17:15,404 - INFO - Processed experiment OUT 2024-09-05 15:17:15,405 - INFO - Processed 1 experiment 2024-09-05 15:17:15,405 - INFO - === IsoQuant pipeline finished === I would like to ask you how to judge the quality of my data according to the log file, such as comparison rate, etc., or how to calculate the reads to the genome, etc. Thank you for your help. Kai

KMbio43 commented 3 weeks ago

Moreover, I looked at the result file and found that the sum of gene read counts was 2418793, which was larger than 2332878 reads of the ccs.fastq file I input. May I ask why? Thank you!

andrewprzh commented 2 weeks ago

Dear @Mikai10043

Thank for the feedback!

You data looks good, you have 2179504 primary alignments, which is quite high percentage, as well as the majority of reads are assigned as unique, unique_minor_difference or inconsistent_non_intronic (meaning only terminal exons differ, e.g. alternative polyA/TSS).

The sum looks odd, considering there are totally 2052786 usable assignments (excluding noninformative, intergenic, inconsistent and inconsistent_ambiguous). I'll double check the algorithm and look if I can reproduce it on my data.

Best Andrey

KMbio43 commented 2 weeks ago

Thank you very much for your reply. Sorry for my question about the sum of the read counts of genes being greater than the reads entered. This is because I didn't notice the last three lines of the gene read counts.tsv are: ambiguous 229438 no_feature 23662 __not_aligned 153374 And I made the mistake of considering that when I added it. I'm sorry for the trouble. Thank you for taking the time to help me out. Thank you very much!

Dear @Mikai10043

Thank for the feedback!

You data looks good, you have 2179504 primary alignments, which is quite high percentage, as well as the majority of reads are assigned as unique, unique_minor_difference or inconsistent_non_intronic (meaning only terminal exons differ, e.g. alternative polyA/TSS).

The sum looks odd, considering there are totally 2052786 usable assignments (excluding noninformative, intergenic, inconsistent and inconsistent_ambiguous). I'll double check the algorithm and look if I can reproduce it on my data.

Best Andrey

andrewprzh commented 2 weeks ago

I myself forgot about those :)