getian107 / PRScsx

Cross-population polygenic prediction
MIT License
69 stars 20 forks source link

PRS-CSx has been running for days #54

Closed mpkol59 closed 3 weeks ago

mpkol59 commented 1 month ago

Hello @getian107 , Thank you very much for your great work with this software. I have been experiencing some issues with long running time for PRS-CSx. I am using summary statistics of African and European populations as the summary statistics, with AFR as the LD reference and an AFR genotype data for the bim file. I do have a large sample size in the EUR but I have limited the run to just chromosome 22, thinking this would make the job finish faster, however, it has been days now, and the run is stuck at the MCMC iterations. Do you think I am missing something in terms of optimizing the usage? Here is my code below, I would appreciate and look forward to your response.

!/bin/bash

SBATCH -A depot

SBATCH -N 2

SBATCH -n 50

SBATCH --time=14:00:00:00

SBATCH --job-name afrPRScsx

export MKL_NUM_THREADS=10 export NUMEXPR_NUM_THREADS=10 export OMP_NUM_THREADS=10

cd /path/path

module load anaconda module load use.own module use PRScsx

for i in {22..22}; do \ PRScsx/PRScsx.py \ --ref_dir=/path/ldblk_ukbb_afr \ --bim_prefix=/path/cafr_chr$i \ --sst_file=eur_chr22.txt,afr_chr22.txt \ --n_gwas=233204,31317 --pop=EUR,AFR --chrom=$i --phi=1e-2 --seed=999 \ --out_dir=/path/path \ --out_name=AFR_EURCSX22chr$i \ ; done

Best wishes.

getian107 commented 1 month ago

Hi - I'd try restricting the job to a single node and a single thread. Long running time is often caused by your job interfering with other jobs running on the same node.

export MKL_NUM_THREADS=1 export NUMEXPR_NUM_THREADS=1 export OMP_NUM_THREADS=1

The running time is independent of the GWAS sample size. For chromosome 22, the computation should be finished in around 30 min.

mpkol59 commented 1 month ago

Thank you for your swift response.