I noticed a lot of variability in the FPKM estimates for some relatively short isoforms when I would perturb the input BAM a little bit. (Note: the way I would perturb the BAM would affect empirical read/fragment length distributions somewhat.) I traced the variability to the transcript's effective_length and eventually to here.
For the affected gene(s) effective_length was calculated as being less than 1, which was puzzling, and would vary a lot (in relative terms, not in absolute) when input was perturbed. I think this pull request is the fix: the effective length calculation is using the pdf member of the EmpDist class, but should be using npdf because it's considering a truncated version of the emp. dist.
I noticed a lot of variability in the FPKM estimates for some relatively short isoforms when I would perturb the input BAM a little bit. (Note: the way I would perturb the BAM would affect empirical read/fragment length distributions somewhat.) I traced the variability to the transcript's
effective_length
and eventually to here.For the affected gene(s)
effective_length
was calculated as being less than 1, which was puzzling, and would vary a lot (in relative terms, not in absolute) when input was perturbed. I think this pull request is the fix: the effective length calculation is using thepdf
member of theEmpDist
class, but should be usingnpdf
because it's considering a truncated version of the emp. dist.