Jingning-Zhang / PROSPER

5 stars 1 forks source link

Help Needed: Using PROSPER for AFR PRS Calculation Without EUR Phenotype Data #2

Open dkssud24 opened 1 month ago

dkssud24 commented 1 month ago

Dear Jingning Zhang,

I was truly impressed by PROSPER's ability to calculate Transferability PRS by considering GWAS from diverse ancestries. I find it promising, especially since the paper states that it outperforms the traditional PRS-CSx method. However, I have one question regarding the following issue: In the example files, both AFR and EUR GWAS are used to fit AFR and EUR PRS, requiring EUR phenotypes. In my case, I only want to use AFR and EUR GWAS to calculate AFR PRS. Below is how I attempted to do this. I am having some difficulties with the example script provided in your GitHub repository. Below, I have included part of the example script that uses both EUR and AFR datasets for the analysis:

Code 1

package='/dcs04/nilanjan/data/jzhang2/MEPRS/pacakge/try_from_github/PROSPER' path_example='/dcs04/nilanjan/data/jzhang2/MEPRS/pacakge/try_from_github/PROSPER/example/' path_result='/dcs04/nilanjan/data/jzhang2/MEPRS/pacakge/try_from_github/PROSPER/PROSPER_example_results/' path_plink='/dcs04/nilanjan/data/jzhang2/TOOLS/plink/plink2' mkdir ${path_result} Rscript ${package}/scripts/lassosum2.R \ --PATH_package ${package} \ --PATH_out ${path_result}/lassosum2 \ --PATH_plink ${path_plink} \ --FILE_sst ${path_example}/summdata/EUR.txt,${path_example}/summdata/AFR.txt \ --pop EUR,AFR \ --chrom 1-22 \ --bfile_tuning ${path_example}/sample_data/EUR/tuning_geno,${path_example}/sample_data/AFR/tuning_geno \ --pheno_tuning ${path_example}/sample_data/EUR/pheno.fam,${path_example}/sample_data/AFR/pheno.fam \ --bfile_testing ${path_example}/sample_data/EUR/testing_geno,${path_example}/sample_data/AFR/testing_geno \ --pheno_testing ${path_example}/sample_data/EUR/pheno.fam,${path_example}/sample_data/AFR/pheno.fam \ --testing TRUE \ --NCORES 5

Code 2

Rscript ${package}/scripts/PROSPER.R \ --PATH_package ${package} \ --PATH_out ${path_result}/PROSPER \ --FILE_sst ${path_example}/summdata/EUR.txt,${path_example}/summdata/AFR.txt \ --pop EUR,AFR \ --lassosum_param ${path_result}/lassosum2/EUR/optimal_param.txt,${path_result}/lassosum2/AFR/optimal_param.txt \ --chrom 1-22 \ --NCORES 5

I understand that this code is applicable when both EUR and AFR have corresponding .bed, .bim, .fam, and phenotype files. However, in my case, I need to calculate PRS for a single ancestry (East Asian). I tried modifying the code as follows to only use AFR, but I received the following error:

Code 3

package='/BiO/hae/phase2/phase3/phase4/phase5/66_Transfer/phase2_230426_2nd_validation/phase2_66_PROSPER/PROSPER' path_example='/BiO/hae/phase2/phase3/phase4/phase5/66_Transfer/phase2_230426_2nd_validation/phase2_66_PROSPER/PROSPER/example/example' path_result='/BiO/hae/phase2/phase3/phase4/phase5/66_Transfer/phase2_230426_2nd_validation/phase2_66_PROSPER/PROSPER/example/example/hae_example_result' path_plink='plink2'

Rscript ${package}/scripts/lassosum2.R \ --PATH_package ${package} \ --PATH_out ${path_result}/lassosum2 \ --PATH_plink ${path_plink} \ --FILE_sst ${path_example}/summdata/EUR.txt,${path_example}/summdata/AFR.txt \ --pop EUR,AFR \ --chrom 1-22 \ --bfile_tuning ${path_example}/sample_data/AFR/tuning_geno \ --pheno_tuning ${path_example}/sample_data/AFR/pheno.fam \ --bfile_testing ${path_example}/sample_data/AFR/testing_geno \ --pheno_testing ${path_example}/sample_data/AFR/pheno.fam \ --testing TRUE \ --NCORES 5

"ERROR: NA.bed input file does not exist":

I suspect the error is due to not providing tuning and testing files for EUR. When I included the EUR tuning and testing files, the script worked correctly:

Code 4

package='/BiO/hae/phase2/phase3/phase4/phase5/66_Transfer/phase2_230426_2nd_validation/phase2_66_PROSPER/PROSPER' path_example='/BiO/hae/phase2/phase3/phase4/phase5/66_Transfer/phase2_230426_2nd_validation/phase2_66_PROSPER/PROSPER/example/example' path_result='/BiO/hae/phase2/phase3/phase4/phase5/66_Transfer/phase2_230426_2nd_validation/phase2_66_PROSPER/PROSPER/example/example/hae_example_result' path_plink='plink2' Rscript ${package}/scripts/lassosum2.R \ --PATH_package ${package} \ --PATH_out ${path_result}/lassosum2 \ --PATH_plink ${path_plink} \ --FILE_sst ${path_example}/summdata/EUR.txt,${path_example}/summdata/AFR.txt \ --pop EUR,AFR \ --chrom 1-22 \ --bfile_tuning ${path_example}/sample_data/EUR/tuning_geno,${path_example}/sample_data/AFR/tuning_geno \ --pheno_tuning ${path_example}/sample_data/AFR/pheno.fam \ --bfile_testing ${path_example}/sample_data/EUR/testing_geno,${path_example}/sample_data/AFR/testing_geno \ --pheno_testing ${path_example}/sample_data/AFR/pheno.fam \ --testing TRUE \ --NCORES 124

From this, I understand that even if EUR PRS is not calculated, the tuning and testing files for EUR still need to be provided. My questions are as follows: If I want to calculate PRS for AFR only, is the above modified #Code 4 correct? If yes, I plan to use the 1000G EUR reference data with 404 samples. Should I split the data for tuning and testing (e.g., 202 for tuning, 202 for testing)? If both of the above approaches are incorrect (#Code 4), could you please provide the correct way to calculate PRS for a single ancestry? I could not find this information in your GitHub repository. I would greatly appreciate any guidance you could provide when you have the time.

Thank you very much for your attention.

Best regards,