cuelee / pleio

17 stars 6 forks source link

ldsc_preprocess #5

Closed mJaredg closed 3 years ago

mJaredg commented 3 years ago

Hello. In the example for ldsc_preprocessing, there is a 'input_list.txt' file with the columns SPREV & PPREV. Are these sample and population case prevalences respectively? Also, is it possible to input summary statistics differently? Thank you

mJaredg commented 3 years ago

Another question, do I need updated ref-ld-chr and w-ld-chr input files for my own set of summary statistics? My analysis is on European data.

cuelee commented 3 years ago

Hello, mJaredg Thanks for using PLEIO.

Q1. In the example for ldsc_preprocessing, there is a 'input_list.txt' file with the columns SPREV & PPREV. Are these sample and population case prevalences respectively?

Yes. The SPREV and PPREV columns in the'input_list.txt' file represent sample prevalence and population prevalence, respectively.

Q2. Also, is it possible to input summary statistics differently?

Could you elaborate this for me?

Q3. Another question, do I need updated ref-ld-chr and w-ld-chr input files for my own set of summary statistics? My analysis is on European data.

Yes. The ref-ld-chr and w-ld-chr provided by the ldsc_preprcess example are reference data for European ancestry. You can also manually download the reference data from LDSC GitHub: LINK

mJaredg commented 3 years ago

Thank you Lee for your responses. In Q2, I meant if I could directly indicate the path to my summary statistics without using an input list. But I guess that's not possible given I would have to find another way to name each trait.

Last question regarding the preprocessing, do I need to convert my SNP identifiers to rs# format? I have a few hundred SNPs whose ID are in chr_bp and I got an error "ValueError: Index SNP invalid". Would that be caused by the non-RSID SNP ids?

mJaredg commented 3 years ago
Screenshot 2021-04-22 at 09 07 33

Screenshot of the error mentioned above

cuelee commented 3 years ago

Sorry for the inconvenience.

This problem doesn't seem to be caused by the use of non-rs# variant names in your input data. Would you like to change the column names of the input summary statistics data of ldsc_preprocess as follows (the order can be changed). SNP, A1, A2, N, Z

If you can't find a solution in my suggestion, could you please send me replicable sample data to the following email? E-mail

mJaredg commented 3 years ago

Hello. Thanks for your response. I managed to get past that issue by ordering entries in the snp column. I had already named the columns as you described. However, now I have a problem with pleio itself. How much does it need computationally? I use a PBS queuing system and my job gets killed before creating the 'output.txt.gz'. The isf and log files are created but log file is empty. I have tried usings 1cpu:96gb and 4cpus:96gb and neither has completed the task. I have about 9m SNPs in the metain.txt.gz file

mJaredg commented 3 years ago

I managed to solve the issue :) now attempting a pleiotropy plot