bvilhjal / ldpred

MIT License
95 stars 58 forks source link

Unable to open object #74

Closed mjalbrzikowski closed 4 years ago

mjalbrzikowski commented 5 years ago

Hello, thank you for creating this wonderful tool! Forgive me for the naive question- I am new to python. I was able to run the test_data for ldpred.py coord but now when I input my own summary statistics, I get this error for every chromosome: Co SNPs.txt ordinating datasets (Summary statistics and LD reference genotypes). 4.35%"Unable to open object (object 'chrom_1' doesn't exist)"

I am guessing there is something wrong w/ my ss file, but I cannot figure out what. The first 100 lines are attached for your reference (note: I just put in allele freq as .5 for the time being). thank you for your help. Best Maria

bvilhjal commented 5 years ago

Hi Maria, thanks for your comment and I apologize for the slow reply, but I was on holiday in April.

You've probably already solved this, or given up, but if not, then I suggest you delete the LD reference file (or give another filename in the parameter) when running LDpred. Alternatively you can try deleting any intermediate files and run again.

Best, Bjarni

JoelLuu commented 5 years ago

Hi Bjarni,

I have the same problem as Maria. I'm able to run the test files but when I start using my own files (ssf and/or LD reference file), there's this error message :

Coordinating datasets (Summary statistics and LD reference genotypes). 4.55%"Unable to open object (object 'chrom_1' doesn't exist)" Did not find chromosome 1 in SS dataset.

and the log looks like this :

Summary statistics filename:
.../ssf_formated_for_ldpred.txt LD reference genotypes filename:
.../LDpred_cc_data_p0.001_train_0 Coordinated data output filename:
.../ldpred_coord.hdf5 ------------------------------ Summary statistics ------------------------------ Num SNPs parsed from sum stats file 0 --------------------------------- Coordination --------------------------------- Num individuals in LD Reference data: 8000 SNPs in LD Reference data: 10000 Num chromosomes used: 0 SNPs common across datasets: 0 SNPs retained after filtering: 0 SNPs w MAF<0.010 filtered: 0 SNPs w allele freq discrepancy > 0.100 filtered: 0 -------------------------------- Running times --------------------------------- Run time for parsing summary stats: 0 min and 31.04 sec Run time for coordinating datasets: 0 min and 0.19 sec

I'm sorry, but I don't understand your answer. What do you mean by deleting the LD reference file? I'm at a loss here but isn't it required to run coord? When I run coord without --gf it obviously didn't work. If I provide another filename, it also didn't work. Also, what do you refer to when you talk about intermediate files?

And by the way, in --help for coord, we have

--reffreq REFFREQ Column header containing the reference MAF

Is it the reference allele frequency or the MAF?

Thank you for your help! Best regards, Joel

yangithub33 commented 5 years ago

I have the same problem. is there solution yet?

JoelLuu commented 5 years ago

Hi,

I managed to solve my problem. The rs id must be the same, or at least partially, in both ssf files and gf file. Because of that, using test file on my own gf file would not work.

However, I still don't understand what Bjarni meant, and my question on reffreq still remains.

Thanks, Joel

Qinqiword commented 4 years ago

Hi Bjarni,

I have the same doubt as Joel. In the parser_coord.add_arguement:

--reffreq REFFREQ Column header containing the reference MAF

Is it the reference allele frequency or the MAF?

Best regards, Jing

bvilhjal commented 4 years ago

Hi,

I apologize for my slow responses. First, yes LDpred matches SNPs across datasets by rsIDs.

Regarding --reffreq, it is the reference allele frequency.