bartongroup / RATS

Relative Abundance of Transcripts: An R package for the detection of Differential Transcript isoform Usage.
MIT License
32 stars 1 forks source link

The other half of the picture is not shown #72

Open Tang-pro opened 3 months ago

Tang-pro commented 3 months ago

Hi, @foreveremain @jamesabbott diu <- call_DTU(annot = GbGTF, count_data_A = fiber10dpa, count_data_B = fiber15dpa, name_A = "Fiber10DPA", name_B = "Fiber15DPA", scaling = 1, verbose = TRUE)

pdf("MSTRG.40436_diu.pdf", width = 10, height = 12) plot_gene(diu, "MSTRG.40436", style = "bycondition") dev.off()

Warning: Removed 192 rows containing non-finite outside the scale range (stat_boxplot()). Warning: Removed 96 rows containing missing values or values outside the scale range (geom_path()). Warning: Removed 192 rows containing missing values or values outside the scale range (geom_point()).

MSTRG 40436_diu_00

May I ask what went wrong? Why is fiber15DPA not displayed at all?

fruce-ki commented 3 months ago

Hello @Tang-pro,

as the warnings say, there are non-numeric values (NA, NaN, Inf) which cannot be plotted. You should look into your expression data and into your diu object and inspect the values available of this gene in this condition. Maybe the gene is not expressed or not detected at your given sequencing depth.

Tang-pro commented 3 months ago

@fruce-ki fiber15dpa[fiber15dpa$Isoform id` == "Gbar_A01G004720.2", ] Isoform id 3-79_fiber_15dpa-1_L1 3-79_fiber_15dpa-2_L1 3-79_fiber_15dpa-3_L1

1: Gbar_A01G004720.2 7.17 4.841 6.323` `fiber10dpa[fiber10dpa$`Isoform id` == "Gbar_A01G004720.2", ] Isoform id 3-79_fiber_10dpa-1_L1 3-79_fiber_10dpa-2_L1 3-79_fiber_10dpa-3_L1 1: Gbar_A01G004720.2 4.252 6.519 7.54` There is expression under these two conditions, but an error is still reported. The 15dpa on the right is still not displayed. ![Gbar_A01G004720_diu_00](https://github.com/bartongroup/RATS/assets/149154213/861fb8d4-c98a-4158-8046-1de52727a9a5)
Tang-pro commented 3 months ago

And plot_diagnostics(diu, type='cormat') image Is there a problem with entering the data? Check the correlation coefficient, this is an obvious error, how to solve it?

fruce-ki commented 3 months ago

Hi @Tang-pro,

Please provide a Minimal Reproducible Example with a subset of data that produces this error, so I can download it and investigate.

Tang-pro commented 3 months ago

Hi, @fruce-ki

fiber10dpa.csv fiber15dpa.csv Gbgtf.csv

code is diu <- call_DTU(annot = GbGTF, count_data_A = fiber10dpa, count_data_B = fiber15dpa, name_A = "Fiber10DPA", name_B = "fiber15dpa", scaling = 1, verbose = FALSE, dprop_thresh = 0.2)

pdf("MSTRG.30080_diu.pdf", width = 10, height = 14) plot_gene(diu, "MSTRG.30080", style = "byisoform") dev.off()

Tang-pro commented 3 months ago

Hi, @fruce-ki There is another question here, whether it is possible to compare multiple conditions (more than 2 conditions). For example, here are 10 developmental periods with fibers. I want to observe the changes in the proportion of isoform during these 10 periods.

Tang-pro commented 3 months ago

Hi, @fruce-ki Sorry to bother you, hope you can reply as soon as possible if you see it

fruce-ki commented 3 months ago

whether it is possible to compare multiple conditions (more than 2 conditions)

Unfortunately no, there is no such functionality.

Tang-pro commented 3 months ago

Hi, @fruce-ki ok , I understand.

Then what is the reason for the error in drawing the picture before? I have provided the data. Could you please take a look and help me understand what might be causing this error? Thank you!

fruce-ki commented 3 months ago

Hi, @fruce-ki

fiber10dpa.csv fiber15dpa.csv Gbgtf.csv

code is diu <- call_DTU(annot = GbGTF, count_data_A = fiber10dpa, count_data_B = fiber15dpa, name_A = "Fiber10DPA", name_B = "fiber15dpa", scaling = 1, verbose = FALSE, dprop_thresh = 0.2)

pdf("MSTRG.30080_diu.pdf", width = 10, height = 14) plot_gene(diu, "MSTRG.30080", style = "byisoform") dev.off()

Thank you for the files.

fruce-ki commented 3 months ago

I can already see an issue. Your data suggests 3 replicates per condition. However, in the plots, it seems all 6 points are shown as one condition.

I don't know yet why.

fruce-ki commented 3 months ago

The plots break, because in the diu$Abundances tables, both conditions have gained three ghost columns (V1, V2, V3) with just NA values. This messes up the restructuring of the data for plotting.

@Tang-pro This should fix the data for plotting purposes:

diu$Abundances$condA <- diu$Abundances$condA[, !grepl('^V', names(diu$Abundances$condA)), with = FALSE]
diu$Abundances$condB <- diu$Abundances$condB[, !grepl('^V', names(diu$Abundances$condB)), with = FALSE]

I still need to figure out how those ghost columns got added.

fruce-ki commented 3 months ago

I was suspecting an issue with the column names of the abundance tables, but renaming them with simple safe names does not fix the problem.

To verify if it is new behaviour from an R or package upgrade, I tested it with the endogenous data emulator:

S <- sim_count_data()
dtu <- call_DTU(annot = S$annot, 
                count_data_A = S$counts_A, 
                count_data_B = S$counts_B, 
                scaling = 1, verbose = FALSE, dprop_thresh = 0.2)

dtu$Abundances$condA[parent_id == 'MIX', ]
dtu$Abundances$condB[parent_id == 'MIX', ]

plot_gene(dtu, "MIX")
plot_gene(dtu, "MIX", style = 'byisoform') 

No ghost columns are added to the emulated data and the plot works fine. So it does not appear to be an inherent new behaviour of RATs caused by a dependency upgrade.

Whatever is causing these ghost columns, it seems to be specific to the annotation/data files used here. But the cause is not obvious to me just by looking at them. The formatting looks normal and as mentioned above the column names are not the culprit.

I would like to understand what is causing this, so that in the future I can prevent it. But as this is not currently my contracted project, I cannot promise when I can look into it further.

Tang-pro commented 3 months ago

Hi, @fruce-ki Thank you very much for your detailed reply, maybe I will need to spend some time researching it. Best!