Gene end coordinates don't always agree with GTF

Nick-Eagles commented 2 years ago

We currently grab gene end coordinates from FeatureCounts, which results in some rows of rse_gene potentially disagreeing (only in end coordinates) with the reference GTF. Instead all coordinates should be pulled from the GTF.

gpertea commented 2 years ago

Gene Length is also grabbed from featureCounts' output, but that doesn't seem to have been affected by the featureCounts upgrade from v1.5 to v2,0 - good news, because for most downstream analyses that was more relevant than the end coordinate..

I guess when the annotation files are built, a rda with the GRanges for genes and exons could be prepared and saved so it could be used later by create_count_objects.R. Length could be added to mcols() -- since for genes it's not simply the width of the GRanges interval but the sum of non-overlapping exon regions for all the transcripts in that gene -- a recipe to get that can be found here: https://www.biostars.org/p/83901/

Nick-Eagles commented 2 years ago

We also read exon coordinates from FeatureCounts output where the GTF should be used instead.

LieberInstitute / SPEAQeasy

Gene end coordinates don't always agree with GTF #88