output explanation - Githubissues

Akanksha2511 commented 2 years ago

Hi, thanks for developing Liqa.

I was wondering if there is a documentation that explains the output column names likes ReadPerGene_corrected and infor_ratio.

Can I use relativeabundance values directly as TPMs ?

Thanks, Akanksha

huyustats commented 2 years ago

@Akanksha2511 Hi, thanks for your interests in using LIQA. ReadPerGene_corrected is the estimated number of reads assigned to the isoform. TPM can be derived using ReadPerGene_corrected. We will update the document to explain the outputs. Thanks!

Akanksha2511 commented 2 years ago

Hi, thanks. Could you please elaborate on how TPM can be derived using the ReadPerGene_corrected values. Sorry I am new to this.

nhartwic commented 2 years ago

Just bumping this. It would be really good to have some description in the docs on what the values in the quantification output are.

EDIT: for those finding this issue in the meantime, my current best guess at the meaning of the columns is

GeneName : the gene id for a gene
IsoformName : the transcript ID for a transcript
ReadPerGene_corrected : absolute count estimate associated with this isoform of this gene
relativeAbundance : the portion of reads associated with this isoform divided by the portion of reads associated with this gene
infor_ratio : the portion of reads that are associated with this gene that provide information on relative transcript abundance

I don't know if this is accurate though.

doshirLV commented 1 year ago

Another bump for this WGLab developers,

A section in the documentation explaining the quantification output, in detail, would help tremendously:

ReadPerGene_corrected represents how many reads were assigned to an isoform while accounting for read bias. And that these are the read number which can be used to calculate TPM. But how does it differ from actual raw read count? Typically read counts should not have values less than 1 unless it is 0.
If I wanted to show how many transcripts were detected, would I list all those transcripts that have a value greater than 0? What about those that have a very small value greater than 0 but very far away from 1 such as "0.000321" or "0.000636"? How would I be able to tell if an isoform is actually detected? Is it better to use Read per gene or relative abundance for this? And is there a suggested cutoff value to use?
Can you provide an example of how to calculate TPM from this value? Especially since the paper mentions you use RPG 10K for quantification of isoform level which may change the math compared to conventional TPM calculation.
Relative abundance is the proportion of reads pertaining to a specific isoform compared to all the reads of the transcripts from the same gene. Please correct me if I am wrong.
Infor_ratio is unclear and I have no idea how to understand it. I know that the value is the same for every gene so it must refer to something that is global for all transcripts from a particular gene.

Thank you for the clarification, Raj

huyustats commented 1 year ago

Another bump for this WGLab developers,

A section in the documentation explaining the quantification output, in detail, would help tremendously:

* **ReadPerGene_corrected** represents how many reads were assigned to an isoform while accounting for read bias. And that these are the read number which can be used to calculate TPM. But how does it differ from actual raw read count? Typically read counts should not have values less than 1 unless it is 0.

* _If I wanted to show how many transcripts were detected, would I list all those transcripts that have a value greater than 0? What about those that have a very small value greater than 0 but very far away from 1 such as "0.000321" or "0.000636"? How would I be able to tell if an isoform is actually detected? Is it better to use Read per gene or relative abundance for this? And is there a suggested cutoff value to use?_

* _Can you provide an example of how to calculate TPM from this value? Especially since the paper mentions you use RPG 10K for quantification of isoform level which may change the math compared to conventional TPM calculation._

* **Relative abundance** is the proportion of reads pertaining to a specific isoform compared to all the reads of the transcripts from the same gene. Please correct me if I am wrong.

* **Infor_ratio** is unclear and I have no idea how to understand it. I know that the value is the same for every gene so it must refer to something that is global for all transcripts from a particular gene.

Thank you for the clarification, Raj

Hi Raj,

RPG_corrected ranges from 0 to inf since it was estimated based on EM algorithm instead of raw count. This is what "corrected" stands for.
Currently, we are testing the sensitivity of novel isoform detection part which is not included in the paper. Yes, please use RPG to compare expression level across genes.
Please refer to this https://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/. We replace # reads with RPG
Yes, it is correct
The ratio of unique isoform mapping reads (100% the read was generate from a specific isoform and exclusive to other isoforms from the gene) over all reads from the gene.

Thanks

WGLab / LIQA

output explanation #11

A section in the documentation explaining the quantification output, in detail, would help tremendously:

A section in the documentation explaining the quantification output, in detail, would help tremendously: