bartongroup / RATS

Relative Abundance of Transcripts: An R package for the detection of Differential Transcript isoform Usage.
MIT License
32 stars 1 forks source link

N/A DTU inflating TRUE DTU #36

Closed fruce-ki closed 6 years ago

fruce-ki commented 7 years ago

While I'd like to keep NA values (aka ineligible feature) separate from genuine FALSE values (tested and did not pass), this causes considerable typing overhead for subsetting the tables by DTU and is an easy and recurrent source of errors in analysis of the results. Subsetting a table by a vector (logical or keys) always also returns lines that are NA for the criterion. In order to only subset the rows with TRUE values, the criterion must include !is.na(). This behaviour affects all the logical fields, not just DTU.

Based on how many times I've been caught out by this bug/feature of tables/R, I think RATs should do a final pass over the results and replace all NA values in the flag fields with explicit FALSE. Ina sense, items not eligible for testing, automatically are not DTU, so this is not too much of a stretch and it should make downstream analysis safer and easier.

fruce-ki commented 6 years ago

All flag values are now either TRUE or FALSE. DTU==FALSE now includes the cases where elig==FALSE, which are not tested and have NA test results. This is not too much of a stretch. Transcripts with too low abundance (including those not expressed) would not be considered reliable DTU even if tested, while genes with just one known transcript isoform certainly cannot show DTU.

This makes subsetting of the results simpler and more intuitive and prevents sneaky errors.