csglab / REMBRANDTS

REMoving Bias from Rna-seq ANalysis of Differential Transcript Stability
GNU General Public License v3.0
13 stars 11 forks source link

Interpretation of Δexon–Δintron vs Δintron scatterplots before and after removing the bias term #13

Open rosshandler opened 8 months ago

rosshandler commented 8 months ago

Dear REMBRANTS team,

I have applied your pipeline to a dataset of different cell lines within the same cell type. Differential expression analysis revealed a large number of downregulated/upregulated genes between them, and I would also expect some deviations from stability.

When running the pipeline, scatter plots of Δexon–Δintron vs Δintron are produced (see one example attached). I uncommented from your code, the plotting of loess fitting regression line and seems to fit a constant line in Δexon–Δintron=0 (red), either before or after correction. I do not see any trends like those the paper (Fig 1.c and 1.d).

Could you please share your interpretation of these plots with me?

The data was generated using a total RNAseq protocol, has good coverage and the Δexon vs Δexon displays good correlation.

Below the relevant text printed by the pipeline: [1] "Optimizing read count cutoff at stringency 0.99 ..." [1] "Total correlation is 1" [1] "Total number of genes is 15181" [1] "Maximum correlation is 1" [1] "Selected threshold is 5.87159523748979" [1] "Number of remaining genes is 12773" . scatterplot CellLine1_rep1 exon

scatterplot

Many thanks, Ivan

hsnajafabadi commented 8 months ago

Hi Ivan,

It seems to me that the correlation between intronic and exonic reads is 1, which is very unusual (intronic and exonic read counts seem to be identical). Can you verify that the read counts are obtained correctly? You can also try the approach described here for obtaining exonic/intronic read counts: https://github.com/csglab/CRIES.

Best,

Hamed

On Tue, Jan 23, 2024 at 7:26 PM rosshandler @.***> wrote:

Dear REMBRANTS team,

I have applied your pipeline to a dataset of different cell lines within the same cell type. Differential expression analysis revealed a large number of downregulated/upregulated genes between them, so I would also expect some deviations from stability.

When running the pipeline, scatter plots of Δexon–Δintron vs Δintron are produced (see one example attached). I uncommented from your code, the plotting of loess fitting regression line and seems to fit a constant line in Δexon–Δintron=0 (red), either before or after correction. I do not see any trends like those the paper (Fig 1.c and 1.d).

Could you please share your interpretation of these plots with me?

The data was generated using a total RNAseq protocol, has good coverage and the Δexon vs Δexon displays good correlation.

Below the relevant text printed by the pipeline: [1] "Optimizing read count cutoff at stringency 0.99 ..." [1] "Total correlation is 1" [1] "Total number of genes is 15181" [1] "Maximum correlation is 1" [1] "Selected threshold is 5.87159523748979" [1] "Number of remaining genes is 12773" . scatterplot.CellLine1_rep1.exon.jpg (view on web) https://github.com/csglab/REMBRANDTS/assets/17701395/b2ff48fc-6434-49f3-9f6f-56c69823365d

scatterplot.jpg (view on web) https://github.com/csglab/REMBRANDTS/assets/17701395/fc80e03e-c332-4721-acca-a963b6fee560

Many thanks, Ivan

— Reply to this email directly, view it on GitHub https://github.com/csglab/REMBRANDTS/issues/13, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACZXQNDSPMXB2AQP77YCE7TYQBIKBAVCNFSM6AAAAABCH4E3RCVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA4TOMJZG42TCMI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

rosshandler commented 7 months ago

Hi Hamed,

Thanks for the quick reply. I checked the files as suggested. Indeed they were basically the same, so I applied CRIES and ran REMBRANDTS again. This time the results are of course more informative. Please find the same plots/info below:

scatterplot sample1_rep1 exon

scatterplot (1)

1] "Optimizing read count cutoff at stringency 0.99 ..." [1] "Total correlation is 0.592024058847667" [1] "Total number of genes is 13915" [1] "Maximum correlation is 0.704987667859836" [1] "Selected threshold is 8.2731303169406" [1] "Number of remaining genes is 4779"

Just one quick question, could you please provide me with a quick interpretation of the corrected plot, why does the slope becomes positive?

Appreciate your help and looking forward for the downstream analysis.

Best, Ivan