choishingwan / PRS-Tutorial

A tutorial on how to run basic polygenic risk score analysis
MIT License
68 stars 104 forks source link

Error: Incompatibility between dimensions #15

Closed XcodeBio closed 3 years ago

XcodeBio commented 3 years ago

Hi Sam,

Thank you very much for the tutorial on running PRS on Chromosome separated bed files. I separated the EUR.QC files into individual chromosomes (for example Eur_chr{1..22}) and tried running PRS on these files. I get an error on running the step

4. Perform LD score regression.

The code below throws an error as "### Error: Incompatibility between dimensions." It works well for Genome Wide bed file but not for Chromosome separated bed files. ldsc <- snp_ldsc( ld, length(ld), chi2 = (df_beta$beta / df_beta$beta_se)^2, sample_size = df_beta$n_eff, blocks = NULL)

Thanks for your help.

choishingwan commented 3 years ago

My bad, have now fixed the scripts.

Sam

On Mon, Nov 9, 2020 at 7:53 AM XcodeBio notifications@github.com wrote:

Hi Sam,

Thank you very much for the tutorial on running PRS on Chromosome separated bed files. I separated the EUR.QC files into individual chromosomes (for example Eur_chr{1..22}) and tried running PRS on these files. I get an error on running the step

4. Perform LD score regression.

The code below throws an error as "### Error: Incompatibility between dimensions." It works well for Genome Wide bed file but not for Chromosome separated bed files. ldsc <- snp_ldsc( ld, length(ld), chi2 = (df_beta$beta / df_beta$beta_se)^2, sample_size = df_beta$n_eff, blocks = NULL)

Thanks for your help.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRS-Tutorial/issues/15, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYRSV6FTMGV6LHEBJZDSO7Q4FANCNFSM4TPJWVLQ .

XcodeBio commented 3 years ago

Thank you very much, Sam.

XcodeBio commented 3 years ago

Hi Sam, I just tried it on Chromosome separated bed files and I think there are a couple of typos.

Calculate the LD matrix Error 1. tmp_snp <- snp_match(sumstats[sumstats\$chr==chr,], map) # because of / the code throws an errr: Error: unexpected input in:" # perform SNP matching tmp_snp <- snp_match(sumstats[sumstats"

Obtain model PRS (Using chromosome separated bed files) Error 2. obj.bigSNP <- snp_attach(paste0("EURchr",chr,".rds")) # unexpected in _.rds.

Apart from these the code works fine.

Thank you very much again for the wonderful tutorial.

choishingwan commented 3 years ago

Oh, thanks for finding that. I was just copy and pasting from my pipeline (which require me to escape the $)

Should have updated the scripts now.

On Mon, Nov 9, 2020 at 11:06 AM XcodeBio notifications@github.com wrote:

Hi Sam, I just tried it on Chromosome separated bed files and I think there are a couple of typos.

Calculate the LD matrix Error 1. tmp_snp <- snp_match(sumstats[sumstats$chr==chr,], map) # because of / the code throws an errr: Error: unexpected input in:" # perform SNP matching tmp_snp <- snp_match(sumstats[sumstats"

Obtain model PRS (Using chromosome separated bed files) Error 2. obj.bigSNP <- snp_attach(paste0("EURchr",chr,".rds")) # unexpected in _.rds.

Apart from these the code works fine.

Thank you very much again for the wonderful tutorial.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/choishingwan/PRS-Tutorial/issues/15#issuecomment-724108555, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJTRYQ56KGLLL2L3D65ANLSPAHPDANCNFSM4TPJWVLQ .

XcodeBio commented 3 years ago

Hi Sam,

I have one question regarding your tutorial. Let me know if you want me to open a separate issue for this.

In Figure 1 of your paper entitled "a guide to performing polygenic risk score analyses" you talk about "Test (Generate PRS and perform association testing)" and "Validate(Out-of-sample PRS testing)" but you do not mention it in the LDpred-2 tutorial. Is there a particular reason for that? Does the LDpred-2 tutorial only covers the "Test (Generate PRS and perform association testing)" part?

I have seen in other PRS tutorials (for example LDpred2) that the author divides example dataset into "validation ( to tune hyper-parameters )" and "testing(to evaluate the final models)".

Thank you.

choishingwan commented 3 years ago

Yes, we only focus on the test part due to practical reasons. All data used in the tutorial are simulated using the 1000 genome European samples which consists of only 500 samples. If we split it, the sample size will be too little. But then if I use some other population (i.e. Asian) to serve as a validation dataset, that might be some problem due to population difference. Unfortunately, I don’t have the time to investigate deeper into that so am just gonna do the testing part.

From: XcodeBio notifications@github.com Reply-To: choishingwan/PRS-Tutorial reply@reply.github.com Date: Tuesday, November 10, 2020 at 6:23 PM To: choishingwan/PRS-Tutorial PRS-Tutorial@noreply.github.com Cc: Shing Wan Choi choishingwan@gmail.com, Comment comment@noreply.github.com Subject: Re: [choishingwan/PRS-Tutorial] Error: Incompatibility between dimensions (#15)

Hi Sam,

I have one question regarding your tutorial. Let me know if you want me to open a separate issue for this.

In Figure 1 of your paper entitled "a guide to performing polygenic risk score analyses" you talk about "Test (Generate PRS and perform association testing)" and "Validate(Out-of-sample PRS testing)" but you do not mention it in the tutorial. Is there a particular reason for that? Does the tutorial only covers the "Test (Generate PRS and perform association testing)" part?

I have seen in other PRS tutorials (for example LDpred2) that the author divides example dataset into "validation ( to tune hyper-parameters )" and "testing(to evaluate the final models)".

Thank you.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

XcodeBio commented 3 years ago

Okay, no problems. I hope you don't mind me asking some questions regarding PRS validation (Out-of-sample PRS testing). Again, please let me know if you want me to open a separate issue for this. Apologies, if the questions are very basic.

I understand that "Testing" includes

  1. Generation of PRS
  2. Examination of the association between PRS and a trait. for example once a PRS of height is constructed (as you show in your LDpred2 tutorial) its association can be tested using PRS~height+covariates.

So my question is regarding PRS validation (Out-of-sample PRS testing).

  1. What are the steps involved in Out-of-sample PRS validation?
  2. What information are you using from the "testing" phase for Out-of-sample PRS validation? PRS, beta, SE??
  3. I am assuming you have to construct a PRS in Out-of-sample set too and then examine any association between the PRS and trait. Is this right?

Thank you again for your help.

choishingwan commented 3 years ago

Hi,

When we are doing the two step approach (testing + validation), what we try to do is as follow

Testing: To obtain the PRS parameters that gives us the most predictive PRS For PRSice, that will be the P-value thresholding For LDpred, it will be the parameters given from their model (think it is the h2, percentage causal and whether it is sparse or dense model) Validation Using the parameters estimated from testing, construct PRS in the validation set In PRSice, you do this by --fastscore --no-full --bar-level <best threshold) Get the R2 and p-value

From: XcodeBio notifications@github.com Reply-To: choishingwan/PRS-Tutorial reply@reply.github.com Date: Monday, November 16, 2020 at 1:05 PM To: choishingwan/PRS-Tutorial PRS-Tutorial@noreply.github.com Cc: Shing Wan Choi choishingwan@gmail.com, Comment comment@noreply.github.com Subject: Re: [choishingwan/PRS-Tutorial] Error: Incompatibility between dimensions (#15)

Okay, no problems. I hope you don't mind me asking some questions regarding PRS validation (Out-of-sample PRS testing). Again, please let me know if you want me to open a separate issue for this. Apologies, if the questions are very basic.

I understand that "Testing" includes Generation of PRS Examination of the association between PRS and a trait. for example once a PRS of height is constructed (as you show in your LDpred2 tutorial) its association can be tested using PRS~height+covariates. So my question is regarding PRS validation (Out-of-sample PRS testing). What are the steps involved in Out-of-sample PRS validation? What information are you using from the "testing" phase for Out-of-sample PRS validation? PRS, beta, SE?? I am assuming you have to construct a PRS in Out-of-sample set too and then examine any association between the PRS and trait. Is this right? Thank you again for your help.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

XcodeBio commented 3 years ago

Hi,

Thank you! Sorry I should have been specific that I am trying to understand your LDpred-2 tutorial which covers the 'testing' approach (https://choishingwan.github.io/PRS-Tutorial/ldpred/) .

I learnt that you have used three LDpred-2 models namely Infinitesimal , Grid and Auto to obtain model PRS (section 7), and then you get get the final performance of the LDpred models in section 8.

  1. Infinitesimal = 0.0100
  2. Grid Model = 0.00180
  3. Auto Model = 0.171

Because Auto model explains the highest phenotypic variance here, based on this would you say that Auto model is the best for prediction? If so, how would you use data from Auto Model to perform validation?

Thank you!

choishingwan commented 3 years ago

Yes, in theory you will want to use the parameter used to calculate the auto score for validation. Unfortunately, as I am not the author of LDpred2, I might not know the best way to approach it. So think it is best for you to consult Florian about that.

XcodeBio commented 3 years ago

Thank you Sam! Appreciate your time!