How to get to the final PRS value

lcstoshio commented 11 months ago

Dear Tian Ge,

Sorry if it's a dumb question, but I am trying to run PRScsx and I am stuck in how to get the coefficients for each ancestry in the linear combination.

For context, I am running PRScsx with three ancestries (european, african and native american) and got the individual scores for each ancestry in plink, but I don't know how to proceed from here to get a single final score.

From what I read I should do a linear regression in the validation dataset and from here learn the coefficients: lm (y ~ PRS_EUR + PRS_AFR + PRS_AMR + covariates)

But I don't know what are the coefficients from the results of the regression and how to proceed from here (is it the "Estimate"? I am running everything in R).

And after i got the coefficients I should just do this right? PRS <- coef_EUR PRS_EUR + coef_AFR PRS_AFR + coef_AMR * PRS_AMR lm (y ~ PRS + covariates) Calculate R2?

Thank you so much. Lucas

getian107 commented 11 months ago

Hi Lucas -- I think your understanding is correct. The coefficients refer to the regression coefficients estimated by fitting the linear regression. I don't use R but it looks that if you use the 'lm' function in R to fit the linear regression, 'coefficients' in the returned class 'lm' would be what you need.

lcstoshio commented 11 months ago

Okay, thank you for the quick response it helped a lot.

I just came across another question, the sample size of my target data is about 2000 individuals (300 cases and 1700 controls) is there a right proportion that i should split my data between validating and testing?

I don't know if it's too little of a sample size to split or should I use the automatic parameters (phi auto and --meta).

getian107 commented 11 months ago

The case number does appear to be on the smaller side. I think auto+meta might be a better choice.

lcstoshio commented 11 months ago

Got it, thank you so much for the help!!

getian107 / PRScsx

How to get to the final PRS value #40