comprna / SUPPA

SUPPA: Fast quantification of splicing and differential splicing
MIT License
263 stars 62 forks source link

Addressing Batch Effects in Datasets with SUPPA2 #182

Open XXuxi opened 9 months ago

XXuxi commented 9 months ago

Dear SUPPA2 Development Team,

I am in the process of using SUPPA2 for differential splicing analysis and have encountered an issue with batch effects in my dataset, which includes sequencing samples from different batches. Upon analyzing TPM values through PCA, it became evident that batch effects are present.

My query is: Should batch effect correction be performed on the TPM values before running SUPPA2?

Thanks for your assistance.

Best regards, Xi Xu

EduEyras commented 9 months ago

Dear Xi Xu,

We have recently handled batch effects using a linear model with co-factors (see https://pubmed.ncbi.nlm.nih.gov/36518527/)

In this case, rather than performing a test between conditions, we try to fit a linear model between the conditions. To that model, you can add a list of cofactors, each described as a vector with the same number of components as your patients. The cofactors could be numerical values (e.g. age), nominal value (sex), another experimental variable (source, post-mortem, …), or even values obtained from other methods that estimate batch effects (e.g. SVA). This model will give you the events that best correlate with the conditions accounting for all those sources of batch effect.

An alternative might be correcting the read counts / TPM values for these batch effects, and then running SUPPA. We have not tried this, so I would not know if this is effective.

I hope this helps

Please do not hesitate to write back with more questions

Thanks a lot for using SUPPA

Best

Eduardo

On Thu, 8 Feb 2024 at 03:35, XXuxi @.***> wrote:

Dear SUPPA2 Development Team,

I am in the process of using SUPPA2 for differential splicing analysis and have encountered an issue with batch effects in my dataset, which includes sequencing samples from different batches. Upon analyzing TPM values through PCA, it became evident that batch effects are present.

My query is: Should batch effect correction be performed on the TPM values before running SUPPA2?

Thanks for your assistance.

Best regards, Xi Xu

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/182, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB7O542RZUQI6L5ZEMLYSOUO7AVCNFSM6AAAAABC6EXRC2VHI2DSMVQWIX3LMV43ASLTON2WKOZSGEZDGNBTGU4DONI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

jdee3 commented 8 months ago

Dear Eduardo,

I just wanted to ask, based on your response, I looked into this paper you attached and it does indeed mention a batch regression approach, but I can't seem to find the code. Can you please point me to where the code was deposited for this batch regression technique? Or, can you please explain exactly how the PSI values were re-modeled using linear regression?

Also, does this regression approach still produce PSI values that range from 0 to 1? Thanks in advance!

-Jay

EduEyras commented 8 months ago

hi,

sorry for the delayed reply. It is a standard lm() function, correcting with co-variables. We used those that we observed had the strongest confounding effect. It is a fairly standard function in R. There should be enough tutorials available or coding co-pilots that could help you identify the syntax to do it. We'll try to make the code available in the SUPPA page. Thanks E.

On Mon, 11 Mar 2024 at 14:15, jdee3 @.***> wrote:

Dear Eduardo,

I just wanted to ask, based on your response, I looked into this paper you attached and it does indeed mention a batch regression approach, but I can't seem to find the code. Can you please point me to where the code was deposited for this batch regression technique? Or, can you please explain exactly how the PSI values were re-modeled using linear regression? Thanks in advance!

-Jay

— Reply to this email directly, view it on GitHub https://github.com/comprna/SUPPA/issues/182#issuecomment-1987549701, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADCZKB6EIFTLWCLSMTIT6Z3YXUOWRAVCNFSM6AAAAABC6EXRC2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSOBXGU2DSNZQGE . You are receiving this because you commented.Message ID: @.***>