Features shorter than meanFragmentLength dropped silently?

AAlhendi1707 / countToFPKM

Convert Counts to Fragments per Kilobase of Transcript per Million (FPKM)

GNU General Public License v3.0

61 stars 15 forks source link

Features shorter than meanFragmentLength dropped silently? #3

Closed jergosh closed 4 years ago

jergosh commented 4 years ago

It would appear that all features that are shorter than the meanInsertSize (in any column) are just dropped silently. Since fpkm() doesn't take feature names, this can make it quite tricky to figure out which FPKM values correspond to what feature (+ I imagine it will not always be obvious why the output matrix has different dimensions from the input one).

AAlhendi1707 commented 4 years ago

Hi there,

For accurate quantification of FPKM of RNA-Seq data, the read counts need to be normalised by feature effective length. To compute the effective length, the meanFragmentLength will be deducted from the feature length. Thus, the features lengthened less than the meanFragmentLength will be automatically dropped off.

Also see Lee et al. 2011 paper for more info about the effective length normalisation.

jergosh commented 4 years ago

I understand why FPKM estimates for these features have to necessarily be 0 but it’s not useful to just drop these features silently without making it clear what the remaining features are.

As it is now, for anyone using the package, it is necessary to write code to remove features whose length is < meanFragmentLength. It would be more user-friendly to either set FPKM values for these features to 0 or at least return FPKM values for remaining features in a named vector.