MartinFXP / DawnRank

driver gene identification algorithm DawnRank 2014
8 stars 1 forks source link

Expression data format #5

Open RhysGillman opened 2 years ago

RhysGillman commented 2 years ago

Hi thank you for developing this great tool! I was wondering if you could provide details on the type of data used for the expression matrices. I see that it appears to be in log form, does the algorithm use TPM or raw counts, or some other quantification? Thanks again.

MartinFXP commented 2 years ago

Hi,

I have to emphasize that I am not the developer/author of this package. I am not actively maintaining it either. I just sped up the package, mostly by allowing it to run tasks in parallel and other minor code optimizations. The references to the actual authors are at the bottom of the readme under references.

I am not 100% certain about the input since it's been a while, but I think the tool leaves it to up to the user on how they want to normalize the data before getting the differential expression from DawnNormalize. However, the input must definitely be logged because the differential expression is computed by subtraction.

In general, logged TPM or even RPKM (FPKM) should be fine. Although normalizing by gene length is usually not necessary for differential expression (it cancels out). Raw counts are usually not recommended.

Best, Martin