Closed mJaredg closed 3 years ago
Another question, do I need updated ref-ld-chr and w-ld-chr input files for my own set of summary statistics? My analysis is on European data.
Hello, mJaredg Thanks for using PLEIO.
Q1. In the example for ldsc_preprocessing, there is a 'input_list.txt' file with the columns SPREV & PPREV. Are these sample and population case prevalences respectively?
Yes. The SPREV and PPREV columns in the'input_list.txt' file represent sample prevalence and population prevalence, respectively.
Q2. Also, is it possible to input summary statistics differently?
Could you elaborate this for me?
Q3. Another question, do I need updated ref-ld-chr and w-ld-chr input files for my own set of summary statistics? My analysis is on European data.
Yes. The ref-ld-chr and w-ld-chr provided by the ldsc_preprcess example are reference data for European ancestry. You can also manually download the reference data from LDSC GitHub: LINK
Thank you Lee for your responses. In Q2, I meant if I could directly indicate the path to my summary statistics without using an input list. But I guess that's not possible given I would have to find another way to name each trait.
Last question regarding the preprocessing, do I need to convert my SNP identifiers to rs# format? I have a few hundred SNPs whose ID are in chr_bp and I got an error "ValueError: Index SNP invalid". Would that be caused by the non-RSID SNP ids?
Screenshot of the error mentioned above
Sorry for the inconvenience.
This problem doesn't seem to be caused by the use of non-rs# variant names in your input data. Would you like to change the column names of the input summary statistics data of ldsc_preprocess as follows (the order can be changed). SNP, A1, A2, N, Z
If you can't find a solution in my suggestion, could you please send me replicable sample data to the following email? E-mail
Hello. Thanks for your response. I managed to get past that issue by ordering entries in the snp column. I had already named the columns as you described. However, now I have a problem with pleio itself. How much does it need computationally? I use a PBS queuing system and my job gets killed before creating the 'output.txt.gz'. The isf and log files are created but log file is empty. I have tried usings 1cpu:96gb and 4cpus:96gb and neither has completed the task. I have about 9m SNPs in the metain.txt.gz file
I managed to solve the issue :) now attempting a pleiotropy plot
Hello. In the example for ldsc_preprocessing, there is a 'input_list.txt' file with the columns SPREV & PPREV. Are these sample and population case prevalences respectively? Also, is it possible to input summary statistics differently? Thank you