Open qsonehara opened 1 week ago
After some inspections, I noticed that I get different data in the linear-format output files every time I rerun IsoQuant. It seems that the second column (group_id) is shuffled.
I was wondering if
for i, g in enumerate(read_groups):
at line 250 of long_read_counter.py
might be
for i, g in enumerate(self.ordered_groups):
That seems to affect the correspondence between read group strings and numeric ids and eventually the linear output formatting at line 393.
Thanks!
Dear @qsonehara
Apologies for delays - I've been away for a while.
Thanks a lot for the investigation, yes, you are absolutely right, that causes inconsistency between self.ordered_groups
and self.group_numeric_ids
.
Thankfully this bug was introduced rather recently and only affects linear format. But anyway, apologies for not noticing this and huge respect for finding it. I make a bug-fix release ASAP.
Best Andrey
Hi, thank you for developing this tool.
I would like to ask about the interpretation of the read assignments file. I'm trying IsoQuant version 3.6.1 on single-cell data with the --no_model_construction option. The gene-expression output file (
OUT.gene_grouped_counts_linear.tsv
) had several cells with a low read count. For example, picking up a cell with a barcode CAGCAGCGTTGGGACA:reveals only one line with a count of 1.00: ENST00000549920.6 CAGCAGCGTTGGGACA 1.00
Meanwhile, when I looked into the read assignment file (
OUT.read_assignments.tsv.gz
), this cell had a total of 696 assignments, including 251 unique ones:54 ambiguous 52 inconsistent 94 inconsistent_ambiguous 177 inconsistent_non_intronic 13 noninformative 251 unique 55 unique_minor_difference
Representative lines:
How do I interpret this result?
Thanks!