dnanexus / UKB_RAP

Access share reviewed code & Jupyter Notebooks for use on the UK Biobank (UKBB) Research Application Platform. Includes resources from DNAnexus webinars, online trainings and workshops.
MIT License
121 stars 46 forks source link

Fitting Linear Model #38

Closed RipeNishtala closed 2 months ago

RipeNishtala commented 3 months ago

Hi Alex Is there a reason why the model was not fitted with npx_normalised data? many thanks prasad

ajlee21 commented 3 months ago

Hi Prasad, Do you mind providing a bit more context here - Which notebook are you referring to?

RipeNishtala commented 3 months ago

Hi Alex I am referring to the proteomics differential expression analyses. You fitted a linear model using the limma package on npx data ( protein markers) rather than the npx normalised data. I am driving home but can send you more info on Monday morning Bw Prasad

Sent from Outlook for iOShttps://aka.ms/o0ukef


From: Alexandra Lee @.> Sent: Friday, June 28, 2024 4:17:29 PM To: dnanexus/UKB_RAP @.> Cc: Prasad Nishtala @.>; Author @.> Subject: Re: [dnanexus/UKB_RAP] Fitting Linear Model (Issue #38)

You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification

CAUTION: This email came from outside of the University. To keep your account safe, only click on links and open attachments if you know the person who sent the email, or you expected to receive this communication.

Hi Prasad, Do you mind providing a bit more context here - Which notebook are you referring to?

— Reply to this email directly, view it on GitHubhttps://github.com/dnanexus/UKB_RAP/issues/38#issuecomment-2197160673, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AU2U5AGXAF5XTQOUBT7FFALZJV5ATAVCNFSM6AAAAABKBXVK46VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJXGE3DANRXGM. You are receiving this because you authored the thread.Message ID: @.***>

RipeNishtala commented 3 months ago

Hi Alex, Here is the link: https://github.com/dnanexus/UKB_RAP/tree/main/proteomics/protein_DE_analysis

lmFit expects input array to have structure: protein x sample

lmFit fits a linear model using weighted least squares for each protein:

fit <- lmFit(t(npx_df), design)

As you can see in ln 22, you have fit the linear model using npx_df rather than npx_normalized data. Apologies if I misunderstood this. I thought we mitigate proteins with huge ranges as they will likely dominate the results.

Yor clarification would be much appreciated. BW prasad From: Prasad Nishtala @.> Sent: Friday, June 28, 2024 4:21 PM To: dnanexus/UKB_RAP @.>; dnanexus/UKB_RAP @.> Cc: Author @.> Subject: Re: [dnanexus/UKB_RAP] Fitting Linear Model (Issue #38)

Hi Alex I am referring to the proteomics differential expression analyses. You fitted a linear model using the limma package on npx data ( protein markers) rather than the npx normalised data. I am driving home but can send you more info on Monday morning Bw Prasad

Sent from Outlook for iOShttps://aka.ms/o0ukef


From: Alexandra Lee @.**@.>> Sent: Friday, June 28, 2024 4:17:29 PM To: dnanexus/UKB_RAP @.**@.>> Cc: Prasad Nishtala @.**@.>>; Author @.**@.>> Subject: Re: [dnanexus/UKB_RAP] Fitting Linear Model (Issue #38)

You don't often get email from @.**@.>. Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification

CAUTION: This email came from outside of the University. To keep your account safe, only click on links and open attachments if you know the person who sent the email, or you expected to receive this communication.

Hi Prasad, Do you mind providing a bit more context here - Which notebook are you referring to?

- Reply to this email directly, view it on GitHubhttps://github.com/dnanexus/UKB_RAP/issues/38#issuecomment-2197160673, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AU2U5AGXAF5XTQOUBT7FFALZJV5ATAVCNFSM6AAAAABKBXVK46VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJXGE3DANRXGM. You are receiving this because you authored the thread.Message ID: @.**@.>>

ajlee21 commented 2 months ago

Great question!

From my understanding it doesn't seem like there is a standard methodology for processing the data and using normalization. For the most part folks seem to be adapting techniques from RNAseq/array data. In this case I didn't normalize the NPX values since it was on log2 scale, similar to array data where I don't believe additional scaling is performed prior to DE. Additionally, if I compare the DE results using with and without normalization there isn't much difference in the number of proteins found.

As for proteins with large levels of activity, I don't think this should affect the likelihood of other proteins found to be DE since each protein is treated independently.

Please let me know if this makes sense

RipeNishtala commented 2 months ago

Many thanks for the clarification. Great work! & i found it very informative. Best wishes prasad

Sent from Outlook for iOShttps://aka.ms/o0ukef


From: Alexandra Lee @.> Sent: Friday, July 12, 2024 7:41:22 PM To: dnanexus/UKB_RAP @.> Cc: Prasad Nishtala @.>; Author @.> Subject: Re: [dnanexus/UKB_RAP] Fitting Linear Model (Issue #38)

You don't often get email from @.*** Learn why this is importanthttps://aka.ms/LearnAboutSenderIdentification

CAUTION: This email came from outside of the University. To keep your account safe, only click on links and open attachments if you know the person who sent the email, or you expected to receive this communication.

Great question!

From my understanding it doesn't seem like there is a standard methodology for processing the data and using normalization. For the most part folks seem to be adapting techniques from RNAseq/array data. In this case I didn't normalize the NPX values since it was on log2 scale, similar to array data where I don't believe additional scaling is performed prior to DE. Additionally, if I compare the DE results using with and without normalization there isn't much difference in the number of proteins found.

As for proteins with large levels of activity, I don't think this should affect the likelihood of other proteins found to be DE since each protein is treated independently.

Please let me know if this makes sense

— Reply to this email directly, view it on GitHubhttps://github.com/dnanexus/UKB_RAP/issues/38#issuecomment-2226168631, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AU2U5AFEWX3MDSE47XRI5L3ZMAPNFAVCNFSM6AAAAABKBXVK46VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMRWGE3DQNRTGE. You are receiving this because you authored the thread.Message ID: @.***>