munge_sumstats.py - Githubissues

bulik / ldsc

LD Score Regression (LDSC)

GNU General Public License v3.0

647 stars 344 forks source link

munge_sumstats.py #26

Closed WilliamDHill closed 9 years ago

WilliamDHill commented 9 years ago

Hello,

Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/01/2015). I'm trying to use it on the data from the psychiatric genome consortium as per he tutorial. When I do I receive the following error

File "munge_sumstats.py", line 11, in from ldscore import sumstats ImportError: cannot import name sumstats

I've ensured that the ldscore.py files are in my working folder. Any help here would be great.

Best, William Hill

bulik commented 9 years ago

I noticed you closed this -- did you get it to work?

WilliamDHill commented 9 years ago

Hello,

Thanks for getting back so quickly. No I didn't I was about to Email. After posting the message I got the feeling that it was for the authors of the program to detail the issues they found rather than for new users.

The issue was with the munge_sumstats.py? script which generates the following error when used with the tutorial data.

File "munge_sumstats.py", line 11, in from ldscore import sumstats ImportError: cannot import name sumstats

Any help here would be greatly appreciated!

Best, Will

From: Brendan Bulik-Sullivan notifications@github.com Sent: 20 February 2015 13:59 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

I noticed you closed this -- did you get it to work?

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-75241482.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

bulik commented 9 years ago

the google group is the best place for questions. but it doesn't really matter -- could you check the following:

try cloning from github again and see if that fixes things
go to the ldsc/ directory and type python ldsc.py -h, which should print a help menu
go to the ldsc/ldscore/ directory and do a ls. There should be several files, but in particular there should be files called __init_.py and sumstats.py. The absence of either of these files would explain the problem

WilliamDHill commented 9 years ago

Again thank you for your quick response. After re-cloning it seems to work. Any more questions and I'll post to the google group.

Cheers, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 20 February 2015 14:14 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

the google grouphttps://groups.google.com/forum/?hl=en#!forum/ldsc_users is the best place for questions. but it doesn't really matter -- could you check the following:

try cloning from github again and see if that fixes things
go to the ldsc/ directory and type python ldsc.py -h, which should print a help menu
go to the ldsc/ldscore/ directory and do a ls. There should be several files, but in particular there should be files called _init.py and sumstats.py. The absence of either of these files would explain the problem

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-75243562.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

WilliamDHill commented 9 years ago

Hi Brendan,

I have another question. I'm trying to recreate the analyses performed in your paper on genetic correlation before moving on to use my own data. I've ran the Alzheimer and college years data sets and I'm getting a slightly different estimate of the genetic correlation. I get rg = -0.375 (SE = 0.102) contrasting with your own estimates of rg = -0.30 (Se = 0.08). I'm not sure why this is. I've not constrained the intercept as there is no sample overlap but I see in the paper that by constraining the intercept I can reduce the standard error. Could you say how the analysis differs between that carried out in the paper and the example provided looking at bipolar disorder and schizophrenia?

Thank you again, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 20 February 2015 14:14 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

the google grouphttps://groups.google.com/forum/?hl=en#!forum/ldsc_users is the best place for questions. but it doesn't really matter -- could you check the following:

try cloning from github again and see if that fixes things
go to the ldsc/ directory and type python ldsc.py -h, which should print a help menu
go to the ldsc/ldscore/ directory and do a ls. There should be several files, but in particular there should be files called _init.py and sumstats.py. The absence of either of these files would explain the problem

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-75243562.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

bulik commented 9 years ago

Could you post the log file?

I've made a couple of tweaks to the statistical core of ldsc since posting to the biorxiv, so the results that you get from the current version will be slightly different than the results on the biorxiv (but only very slightly different). I'll update the numbers on the biorxiv when I revise the paper. We also used unconstrained intercept for all of the results in fig 2.

For Alzheimer's specifically, we excluded the APoE locus -- it has such a huge effect on Alzheimer's that you would probably see genetic correlation between Alzheimer's and any other trait with a GWAS hit at APoE, but it seems silly to report 'genome-wide genetic correlation' when what's actually going on is 'GWAS hits at the APoE locus'. And if you want to know which traits are influenced by variants at the APoE locus, you don't need ldsc; it's a simple GWAS catalog lookup.

bulik commented 9 years ago

PS if you don't mind, when you respond with the log file, post to the google group so it will be easier for other people with the same question to see the answer

WilliamDHill commented 9 years ago

Ok no worries I'll copy the question over with the log file. Thanks again for this. I'll expand the question a bit too.

Best, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 22 February 2015 17:17 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

PS if you don't mind, when you respond with the log file, post to the google grouphttps://groups.google.com/forum/?hl=en#!forum/ldsc_users so it will be easier for other people with the same question to see the answer

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-75445867.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

WilliamDHill commented 9 years ago

Hi Brendan,

I sent a question to you but I'm afraid I'm going to have to press you for an answer as our group is aiming to submit a paper using your method. Could you tell me the version of the 1000 genomes that was used with LD regression for the genetic correlations? Thanks for this and sorry for the urgency.

Best wishes, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 22 February 2015 17:17 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-75445867.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

bulik commented 9 years ago

Hi David,

We used integrated_phase1_v3.20101123 for all of the LD Score regression papers.

Best, Brendan

On Fri, Aug 7, 2015 at 6:07 AM, WilliamDHill notifications@github.com wrote:

Hi Brendan,

I sent a question to you but I'm afraid I'm going to have to press you for an answer as our group is aiming to submit a paper using your method. Could you tell me the version of the 1000 genomes that was used with LD regression for the genetic correlations? Thanks for this and sorry for the urgency.

Best wishes, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 22 February 2015 17:17 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

PS if you don't mind, when you respond with the log file, post to the google grouphttps://groups.google.com/forum/?hl=en#!forum/ldsc_users so it will be easier for other people with the same question to see the answer

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-75445867>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-128662578.

WilliamDHill commented 9 years ago

Hi Brendan,

Many thanks for this. I wanted to ask where the LD score genetic correlation paper is now? I ask as I've had reviews back stating that as the method is unpublished papers using it cannot be considered for publication.

Cheers, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 August 2015 20:05 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

We used integrated_phase1_v3.20101123 for all of the LD Score regression papers.

Best, Brendan

On Fri, Aug 7, 2015 at 6:07 AM, WilliamDHill notifications@github.com wrote:

Hi Brendan,

I sent a question to you but I'm afraid I'm going to have to press you for an answer as our group is aiming to submit a paper using your method. Could you tell me the version of the 1000 genomes that was used with LD regression for the genetic correlations? Thanks for this and sorry for the urgency.

Best wishes, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 22 February 2015 17:17 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

PS if you don't mind, when you respond with the log file, post to the google grouphttps://groups.google.com/forum/?hl=en#!forum/ldsc_users so it will be easier for other people with the same question to see the answer

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-75445867>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-128662578.

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-128796770.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]https://github.com/bulik/ldsc/issues/26#issuecomment-128796770

munge_sumstats.py · Issue #26 · bulik/ldsc · GitHub Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...https://github.com/bulik/ldsc/issues/26#issuecomment-128796770

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

bulik commented 9 years ago

I believe the paper is scheduled to appear online in Nature Genetics today at noon, but I'm not 100% sure.

FYI, many (most?) journals have explicit policies allowing citations to preprints.

Cheers, Brendan

On Mon, Sep 28, 2015 at 9:49 AM, WilliamDHill notifications@github.com wrote:

Hi Brendan,

Many thanks for this. I wanted to ask where the LD score genetic correlation paper is now? I ask as I've had reviews back stating that as the method is unpublished papers using it cannot be considered for publication.

Cheers, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 August 2015 20:05

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

We used integrated_phase1_v3.20101123 for all of the LD Score regression papers.

Best, Brendan

On Fri, Aug 7, 2015 at 6:07 AM, WilliamDHill notifications@github.com wrote:

Hi Brendan,

I sent a question to you but I'm afraid I'm going to have to press you for an answer as our group is aiming to submit a paper using your method. Could you tell me the version of the 1000 genomes that was used with LD regression for the genetic correlations? Thanks for this and sorry for the urgency.

Best wishes, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 22 February 2015 17:17 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

PS if you don't mind, when you respond with the log file, post to the google grouphttps://groups.google.com/forum/?hl=en#!forum/ldsc_users so it will be easier for other people with the same question to see the answer

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-75445867>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-128662578.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-128796770>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-128796770>

munge_sumstats.py · Issue #26 · bulik/ldsc · GitHub Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-128796770>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-143748469.

bulik commented 9 years ago

P.S. http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3406.html

On Mon, Sep 28, 2015 at 10:04 AM, Brendan Bulik-Sullivan < bbuliksullivan@gmail.com> wrote:

I believe the paper is scheduled to appear online in Nature Genetics today at noon, but I'm not 100% sure.

FYI, many (most?) journals have explicit policies allowing citations to preprints.

Cheers, Brendan

On Mon, Sep 28, 2015 at 9:49 AM, WilliamDHill notifications@github.com wrote:

Hi Brendan,

Many thanks for this. I wanted to ask where the LD score genetic correlation paper is now? I ask as I've had reviews back stating that as the method is unpublished papers using it cannot be considered for publication.

Cheers, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 August 2015 20:05

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

We used integrated_phase1_v3.20101123 for all of the LD Score regression papers.

Best, Brendan

On Fri, Aug 7, 2015 at 6:07 AM, WilliamDHill notifications@github.com wrote:

Hi Brendan,

I sent a question to you but I'm afraid I'm going to have to press you for an answer as our group is aiming to submit a paper using your method. Could you tell me the version of the 1000 genomes that was used with LD regression for the genetic correlations? Thanks for this and sorry for the urgency.

Best wishes, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 22 February 2015 17:17 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

PS if you don't mind, when you respond with the log file, post to the google grouphttps://groups.google.com/forum/?hl=en#!forum/ldsc_users so it will be easier for other people with the same question to see the answer

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-75445867>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-128662578.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-128796770>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-128796770>

munge_sumstats.py · Issue #26 · bulik/ldsc · GitHub Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-128796770>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-143748469.

WilliamDHill commented 9 years ago

Many thanks Brendan.

From: Brendan Bulik-Sullivan notifications@github.com Sent: 28 September 2015 16:07 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

P.S. http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3406.html

On Mon, Sep 28, 2015 at 10:04 AM, Brendan Bulik-Sullivan < bbuliksullivan@gmail.com> wrote:

I believe the paper is scheduled to appear online in Nature Genetics today at noon, but I'm not 100% sure.

FYI, many (most?) journals have explicit policies allowing citations to preprints.

Cheers, Brendan

On Mon, Sep 28, 2015 at 9:49 AM, WilliamDHill notifications@github.com wrote:

Hi Brendan,

Many thanks for this. I wanted to ask where the LD score genetic correlation paper is now? I ask as I've had reviews back stating that as the method is unpublished papers using it cannot be considered for publication.

Cheers, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 August 2015 20:05

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

We used integrated_phase1_v3.20101123 for all of the LD Score regression papers.

Best, Brendan

On Fri, Aug 7, 2015 at 6:07 AM, WilliamDHill notifications@github.com wrote:

Hi Brendan,

I sent a question to you but I'm afraid I'm going to have to press you for an answer as our group is aiming to submit a paper using your method. Could you tell me the version of the 1000 genomes that was used with LD regression for the genetic correlations? Thanks for this and sorry for the urgency.

Best wishes, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 22 February 2015 17:17 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

PS if you don't mind, when you respond with the log file, post to the google grouphttps://groups.google.com/forum/?hl=en#!forum/ldsc_users so it will be easier for other people with the same question to see the answer

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-75445867>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-128662578.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-128796770>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-128796770>

munge_sumstats.py · Issue #26 · bulik/ldsc · GitHub Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-128796770>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-143748469.

— Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-143769610.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]https://github.com/bulik/ldsc/issues/26#issuecomment-143769610

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

WilliamDHill commented 9 years ago

And of course congratulations!

David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 28 September 2015 16:07 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

P.S. http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3406.html

On Mon, Sep 28, 2015 at 10:04 AM, Brendan Bulik-Sullivan < bbuliksullivan@gmail.com> wrote:

I believe the paper is scheduled to appear online in Nature Genetics today at noon, but I'm not 100% sure.

FYI, many (most?) journals have explicit policies allowing citations to preprints.

Cheers, Brendan

On Mon, Sep 28, 2015 at 9:49 AM, WilliamDHill notifications@github.com wrote:

Hi Brendan,

Many thanks for this. I wanted to ask where the LD score genetic correlation paper is now? I ask as I've had reviews back stating that as the method is unpublished papers using it cannot be considered for publication.

Cheers, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 August 2015 20:05

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

We used integrated_phase1_v3.20101123 for all of the LD Score regression papers.

Best, Brendan

On Fri, Aug 7, 2015 at 6:07 AM, WilliamDHill notifications@github.com wrote:

Hi Brendan,

I sent a question to you but I'm afraid I'm going to have to press you for an answer as our group is aiming to submit a paper using your method. Could you tell me the version of the 1000 genomes that was used with LD regression for the genetic correlations? Thanks for this and sorry for the urgency.

Best wishes, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 22 February 2015 17:17 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

PS if you don't mind, when you respond with the log file, post to the google grouphttps://groups.google.com/forum/?hl=en#!forum/ldsc_users so it will be easier for other people with the same question to see the answer

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-75445867>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-128662578.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-128796770>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-128796770>

munge_sumstats.py · Issue #26 · bulik/ldsc · GitHub Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-128796770>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-143748469.

— Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-143769610.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]https://github.com/bulik/ldsc/issues/26#issuecomment-143769610

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

WilliamDHill commented 9 years ago

Hi Brendan,

I'm encountering an error using LD Score regression to derive genetic correlations. The error reads

Computing rg for phenotype 10/10 Reading summary statistics from ../data.sumstats.gz ... Read summary statistics for 1217311 SNPs. After merging with summary statistics, 1167648 SNPs remain. 1059971 SNPs with valid alleles. ERROR computing rg for phenotype 10/10, from file ../data.sumstats.gz. Traceback (most recent call last): File "/Workspace/ldsc/ldscore/sumstats.py", line 340, in estimate_rg loop = _read_other_sumstats(args, log, p2, sumstats, ref_ld_cnames) File "/Workspace/ldsc/ldscore/sumstats.py ", line 372, in _read_other_sumstats loop['Z2'] = _align_alleles(loop.Z2, alleles) File ""/Workspace/ldsc/ldscore/sumstats.py ", line 443, in _alignalleles z = (-1) _ alleles.apply(lambda y: FLIP_ALLELES[y]) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 2053, in apply mapped = lib.map_infer(values, f, convert=convert_dtype) File "pandas/src/inference.pyx", line 1064, in pandas.lib.mapinfer (pandas/lib.c:58519) File ""/Workspace/ldsc/ldscore/sumstats.py ", line 443, in z = (-1) _ alleles.apply(lambda y: FLIP_ALLELES[y]) KeyError: 'CTTG'

The data set generating this error is one I've used before with LD regression and haven't altered. I'm trying to use this to create genetic correlations with a new data set. All the phenotypes from this new data set seem to run, but when I incorporate the old data set the error message is delivered.

Any help here would be greatly appreciated.

Best, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 28 September 2015 16:07 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

P.S. http://www.nature.com/ng/journal/vaop/ncurrent/full/ng.3406.html

On Mon, Sep 28, 2015 at 10:04 AM, Brendan Bulik-Sullivan < bbuliksullivan@gmail.com> wrote:

I believe the paper is scheduled to appear online in Nature Genetics today at noon, but I'm not 100% sure.

FYI, many (most?) journals have explicit policies allowing citations to preprints.

Cheers, Brendan

On Mon, Sep 28, 2015 at 9:49 AM, WilliamDHill notifications@github.com wrote:

Hi Brendan,

Many thanks for this. I wanted to ask where the LD score genetic correlation paper is now? I ask as I've had reviews back stating that as the method is unpublished papers using it cannot be considered for publication.

Cheers, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 August 2015 20:05

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

We used integrated_phase1_v3.20101123 for all of the LD Score regression papers.

Best, Brendan

On Fri, Aug 7, 2015 at 6:07 AM, WilliamDHill notifications@github.com wrote:

Hi Brendan,

I sent a question to you but I'm afraid I'm going to have to press you for an answer as our group is aiming to submit a paper using your method. Could you tell me the version of the 1000 genomes that was used with LD regression for the genetic correlations? Thanks for this and sorry for the urgency.

Best wishes, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 22 February 2015 17:17 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

PS if you don't mind, when you respond with the log file, post to the google grouphttps://groups.google.com/forum/?hl=en#!forum/ldsc_users so it will be easier for other people with the same question to see the answer

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-75445867>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-128662578.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-128796770>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-128796770>

munge_sumstats.py · Issue #26 · bulik/ldsc · GitHub Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-128796770>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-143748469.

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-143769610.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]https://github.com/bulik/ldsc/issues/26#issuecomment-143769610

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

bulik commented 9 years ago

Could you check that (1) both files were generated using munge_sumstats.py with the --merge-alleles flag and (2) you're using the most recent version of ldsc? If both (1) and (2) are true, could you email me a set of files that triggers this error so that I can reproduce the error and debug?

WilliamDHill commented 9 years ago

Hi Brendan,

Many thanks the --merge-alleles flag was indeed absent in our new data. Can I ask you about your choice of data from the plasma lipids consortium? There was a new meta analysis in 2013, but in your genetic atlas paper you've used the 2010 version. I see that there are a number of non-Europeans in the new 2013 sample, was this the reason for using the 2010 data?

2010 paper http://www.nature.com/nature/journal/v466/n7307/pdf/nature09270.pdf

2013 paper http://www.nature.com/ng/journal/v45/n11/full/ng.2797.html#ref8

Best, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 01 October 2015 17:01 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Could you check that (1) both files were generated using munge_sumstats.py with the --merge-alleles flag and (2) you're using the most recent version of ldsc? If both (1) and (2) are true, could you email me a set of files that triggers this error so that I can reproduce the error and debug?

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-144772243.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]https://github.com/bulik/ldsc/issues/26#issuecomment-144772243

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

bulik commented 9 years ago

That's right -- we used the 2010 data because the 2013 data were only available in a pooled-across-continents meta-analysis format, which is incompatible with LD Score regression.

WilliamDHill commented 9 years ago

Again thank you for this Brendan. You're a real boon.

David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 06 October 2015 14:40 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

That's right -- we used the 2010 data because the 2013 data were only available in a pooled-across-continents meta-analysis format, which is incompatible with LD Score regression.

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-145859412.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]https://github.com/bulik/ldsc/issues/26#issuecomment-145859412

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

bulik commented 9 years ago

No problem. FYI I also made the error that you ran into a few posts ago print out a message that is a little more informative with an explicit suggestion that the --merge-alleles flag may have been missing.

WilliamDHill commented 9 years ago

Hi Brendan,

Another question I'm afraid. So I've been using LDS regression to perform genetic correlations and I'm about to start with the partitioned heritability. I'm running the test data and script from the tutorial

https://github.com/bulik/ldsc/wiki/Partitioned-Heritability

however I run into an error when I try to run the baseline model

Traceback (most recent call last): File "ldsc/ldsc.py", line 623, in sumstats.estimate_h2(args, log) File "/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/sumstats.py", line 279, in estimate_h2 sumstats = sumstats[ii] File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1785, in getitem return self._getitem_array(key) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1827, in _getitem_array return self.take(indexer, axis=0, convert=False) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 1357, in take convert=True, verify=True) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 3275, in take axis=axis, allow_dups=True) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 3162, in reindex_indexer for blk in self.blocks] File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 857, in take_nd allow_fill=True, fill_value=fill_value) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/common.py", line 844, in take_nd func(arr, indexer, out, fill_value) File "pandas/src/generated.pyx", line 5715, in pandas.algos.take_2d_axis1_float64_float64 (pandas/algos.c:106840) File "stringsource", line 614, in View.MemoryView.memoryview_cwrapper (pandas/algos.c:187428) File "stringsource", line 321, in View.MemoryView.memoryview.cinit (pandas/algos.c:184017) ValueError: buffer source array is read-only

I'm not sure what's going on here. Additionally LDS regression should take around 10 mins. However It seems to be taking around an hour to run the genetic correlations. This time is spent reading in the LD scores (Read reference panel LD Scores for 1189907 SNPs.).

Any help on either of these two issues would be appreciated greatly.

Thank you again for this help.

Best, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 October 2015 01:34 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

No problem. FYI I also made the error that you ran into a few posts ago print out a message that is a little more informative with an explicit suggestion that the --merge-alleles flag may have been missing.

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-146044056.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]https://github.com/bulik/ldsc/issues/26#issuecomment-146044056

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...https://github.com/bulik/ldsc/issues/26#issuecomment-146044056

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

hilaryfinucane commented 9 years ago

Hi David,

Could you please let me know which commands you used and which files, and which versions of each package you have installed?

Best,

Hilary

On Thu, Oct 22, 2015 at 6:03 AM, WilliamDHill notifications@github.com wrote:

Hi Brendan,

Another question I'm afraid. So I've been using LDS regression to perform genetic correlations and I'm about to start with the partitioned heritability. I'm running the test data and script from the tutorial

https://github.com/bulik/ldsc/wiki/Partitioned-Heritability

however I run into an error when I try to run the baseline model

Traceback (most recent call last): File "ldsc/ldsc.py", line 623, in sumstats.estimate_h2(args, log) File "/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/sumstats.py", line 279, in estimate_h2 sumstats = sumstats[ii] File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1785, in getitem return self._getitem_array(key) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1827, in _getitem_array return self.take(indexer, axis=0, convert=False) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 1357, in take convert=True, verify=True) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 3275, in take axis=axis, allow_dups=True) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 3162, in reindex_indexer for blk in self.blocks] File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 857, in take_nd allow_fill=True, fill_value=fill_value) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/common.py", line 844, in take_nd func(arr, indexer, out, fill_value) File "pandas/src/generated.pyx", line 5715, in pandas.algos.take_2d_axis1_float64_float64 (pandas/algos.c:106840) File "stringsource", line 614, in View.MemoryView.memoryview_cwrapper (pandas/algos.c:187428) File "stringsource", line 321, in View.MemoryView.memoryview.cinit (pandas/algos.c:184017) ValueError: buffer source array is read-only

I'm not sure what's going on here. Additionally LDS regression should take around 10 mins. However It seems to be taking around an hour to run the genetic correlations. This time is spent reading in the LD scores (Read reference panel LD Scores for 1189907 SNPs.).

Any help on either of these two issues would be appreciated greatly.

Thank you again for this help.

Best, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 October 2015 01:34 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

No problem. FYI I also made the error that you ran into a few posts ago print out a message that is a little more informative with an explicit suggestion that the --merge-alleles flag may have been missing.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150167709.

WilliamDHill commented 9 years ago

Hi Hilary,

Thanks for getting back so quickly with this.

I used the data on height from

http://www.broadinstitute.org/collaboration/giant/images/b/b7/GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt.gz

and I downloaded the files from

https://data.broadinstitute.org/alkesgroup/LDSCORE/

I initially ran

python ldsc/munge_sumstats.py --sumstats GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt --merge-alleles w_hm3.snplist --out BMI/BMI --a1-inc

which I seem to have ran correctly. Followed by

python ldsc/ldsc.py --h2 BMI/BMI.sumstats.gz --ref-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/baseline. --w-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/weights. --overlap-annot --frqfile-chr data.broadinstitute.org/alkesgroup/LDSCORE/1000G.mac5eur. --out BMI/BMI_baseline

which generates the error.

The version of LD score regression I have is

LD Score Regression (LDSC)
Version 1.0.0
(C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane
Broad Institute of MIT and Harvard / MIT Department of Mathematics
GNU General Public License v3

Again thank you for your help.

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:13 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Could you please let me know which commands you used and which files, and which versions of each package you have installed?

Best,

Hilary

On Thu, Oct 22, 2015 at 6:03 AM, WilliamDHill notifications@github.com wrote:

Hi Brendan,

Another question I'm afraid. So I've been using LDS regression to perform genetic correlations and I'm about to start with the partitioned heritability. I'm running the test data and script from the tutorial

https://github.com/bulik/ldsc/wiki/Partitioned-Heritability

however I run into an error when I try to run the baseline model

Traceback (most recent call last): File "ldsc/ldsc.py", line 623, in sumstats.estimate_h2(args, log) File "/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/sumstats.py", line 279, in estimate_h2 sumstats = sumstats[ii] File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1785, in getitem return self._getitem_array(key) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1827, in _getitem_array return self.take(indexer, axis=0, convert=False) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 1357, in take convert=True, verify=True) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 3275, in take axis=axis, allow_dups=True) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 3162, in reindex_indexer for blk in self.blocks] File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 857, in take_nd allow_fill=True, fill_value=fill_value) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/common.py", line 844, in take_nd func(arr, indexer, out, fill_value) File "pandas/src/generated.pyx", line 5715, in pandas.algos.take_2d_axis1_float64_float64 (pandas/algos.c:106840) File "stringsource", line 614, in View.MemoryView.memoryview_cwrapper (pandas/algos.c:187428) File "stringsource", line 321, in View.MemoryView.memoryview.cinit (pandas/algos.c:184017) ValueError: buffer source array is read-only

I'm not sure what's going on here. Additionally LDS regression should take around 10 mins. However It seems to be taking around an hour to run the genetic correlations. This time is spent reading in the LD scores (Read reference panel LD Scores for 1189907 SNPs.).

Any help on either of these two issues would be appreciated greatly.

Thank you again for this help.

Best, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 October 2015 01:34 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

No problem. FYI I also made the error that you ran into a few posts ago print out a message that is a little more informative with an explicit suggestion that the --merge-alleles flag may have been missing.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150167709.

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-150217983.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]https://github.com/bulik/ldsc/issues/26#issuecomment-150217983

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

hilaryfinucane commented 9 years ago

Hi David,

Is your version of pandas up to date? Also, can you try the same commands without the --overlap-annot and --frqfile-chr flags? The results won't be interpretable but just to check if the program crashes.

Hilary

On Thu, Oct 22, 2015 at 9:28 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

Thanks for getting back so quickly with this.

I used the data on height from

http://www.broadinstitute.org/collaboration/giant/images/b/b7/GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt.gz

and I downloaded the files from

https://data.broadinstitute.org/alkesgroup/LDSCORE/

I initially ran

python ldsc/munge_sumstats.py --sumstats GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt --merge-alleles w_hm3.snplist --out BMI/BMI --a1-inc

which I seem to have ran correctly. Followed by

python ldsc/ldsc.py --h2 BMI/BMI.sumstats.gz --ref-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/baseline. --w-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/weights. --overlap-annot --frqfile-chr data.broadinstitute.org/alkesgroup/LDSCORE/1000G.mac5eur. --out BMI/BMI_baseline

which generates the error.

The version of LD score regression I have is

LD Score Regression (LDSC)

Version 1.0.0

(C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane

Broad Institute of MIT and Harvard / MIT Department of Mathematics

GNU General Public License v3

Again thank you for your help.

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:13

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Could you please let me know which commands you used and which files, and which versions of each package you have installed?

Best,

Hilary

On Thu, Oct 22, 2015 at 6:03 AM, WilliamDHill notifications@github.com wrote:

Hi Brendan,

Another question I'm afraid. So I've been using LDS regression to perform genetic correlations and I'm about to start with the partitioned heritability. I'm running the test data and script from the tutorial

https://github.com/bulik/ldsc/wiki/Partitioned-Heritability

however I run into an error when I try to run the baseline model

Traceback (most recent call last): File "ldsc/ldsc.py", line 623, in sumstats.estimate_h2(args, log) File "/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/sumstats.py", line 279, in estimate_h2 sumstats = sumstats[ii] File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1785, in getitem return self._getitem_array(key) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1827, in _getitem_array return self.take(indexer, axis=0, convert=False) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 1357, in take convert=True, verify=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 3275, in take axis=axis, allow_dups=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 3162, in reindex_indexer for blk in self.blocks] File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 857, in take_nd allow_fill=True, fill_value=fill_value) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/common.py", line 844, in take_nd func(arr, indexer, out, fill_value) File "pandas/src/generated.pyx", line 5715, in pandas.algos.take_2d_axis1_float64_float64 (pandas/algos.c:106840) File "stringsource", line 614, in View.MemoryView.memoryview_cwrapper (pandas/algos.c:187428) File "stringsource", line 321, in View.MemoryView.memoryview.cinit (pandas/algos.c:184017) ValueError: buffer source array is read-only

I'm not sure what's going on here. Additionally LDS regression should take around 10 mins. However It seems to be taking around an hour to run the genetic correlations. This time is spent reading in the LD scores (Read reference panel LD Scores for 1189907 SNPs.).

Any help on either of these two issues would be appreciated greatly.

Thank you again for this help.

Best, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 October 2015 01:34 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

No problem. FYI I also made the error that you ran into a few posts ago print out a message that is a little more informative with an explicit suggestion that the --merge-alleles flag may have been missing.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150167709.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150223744.

WilliamDHill commented 9 years ago

Hi Hilary,

We've updated pandas to the version here

http://pandas.pydata.org/

unfortunately this now produces the error. If I omit the --overlap-annot and --frqfile-chr flags the same error is generated.

Traceback (most recent call last): File "ldsc/ldsc.py", line 12, in import ldscore.parse as ps File "/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/parse.py", line 10, in import pandas as pd File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/init.py", line 44, in from pandas.core.api import * File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/api.py", line 9, in from pandas.core.groupby import Grouper File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 16, in from pandas.core.frame import DataFrame File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 41, in from pandas.core.series import Series File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 2864, in import pandas.tools.plotting as _gfx File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py", line 135, in if _mpl_ge_1_5_0(): File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py", line 130, in _mpl_ge_1_5_0 return (matplotlib.version >= LooseVersion('1.5') File "/usr/local/anaconda/lib/python2.7/distutils/version.py", line 296, in cmp return cmp(self.version, other.version) AttributeError: 'unicode' object has no attribute 'version'

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:36 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Hilary

On Thu, Oct 22, 2015 at 9:28 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

Thanks for getting back so quickly with this.

I used the data on height from

http://www.broadinstitute.org/collaboration/giant/images/b/b7/GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt.gz

and I downloaded the files from

https://data.broadinstitute.org/alkesgroup/LDSCORE/

I initially ran

python ldsc/munge_sumstats.py --sumstats GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt --merge-alleles w_hm3.snplist --out BMI/BMI --a1-inc

which I seem to have ran correctly. Followed by

python ldsc/ldsc.py --h2 BMI/BMI.sumstats.gz --ref-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/baseline. --w-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/weights. --overlap-annot --frqfile-chr data.broadinstitute.org/alkesgroup/LDSCORE/1000G.mac5eur. --out BMI/BMI_baseline

which generates the error.

The version of LD score regression I have is

LD Score Regression (LDSC)

Version 1.0.0

(C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane

Broad Institute of MIT and Harvard / MIT Department of Mathematics

GNU General Public License v3

Again thank you for your help.

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:13

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Could you please let me know which commands you used and which files, and which versions of each package you have installed?

Best,

Hilary

On Thu, Oct 22, 2015 at 6:03 AM, WilliamDHill notifications@github.com wrote:

Hi Brendan,

Another question I'm afraid. So I've been using LDS regression to perform genetic correlations and I'm about to start with the partitioned heritability. I'm running the test data and script from the tutorial

https://github.com/bulik/ldsc/wiki/Partitioned-Heritability

however I run into an error when I try to run the baseline model

Traceback (most recent call last): File "ldsc/ldsc.py", line 623, in sumstats.estimate_h2(args, log) File "/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/sumstats.py", line 279, in estimate_h2 sumstats = sumstats[ii] File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1785, in getitem return self._getitem_array(key) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1827, in _getitem_array return self.take(indexer, axis=0, convert=False) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 1357, in take convert=True, verify=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 3275, in take axis=axis, allow_dups=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 3162, in reindex_indexer for blk in self.blocks] File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py", line 857, in take_nd allow_fill=True, fill_value=fill_value) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/common.py", line 844, in take_nd func(arr, indexer, out, fill_value) File "pandas/src/generated.pyx", line 5715, in pandas.algos.take_2d_axis1_float64_float64 (pandas/algos.c:106840) File "stringsource", line 614, in View.MemoryView.memoryview_cwrapper (pandas/algos.c:187428) File "stringsource", line 321, in View.MemoryView.memoryview.cinit (pandas/algos.c:184017) ValueError: buffer source array is read-only

I'm not sure what's going on here. Additionally LDS regression should take around 10 mins. However It seems to be taking around an hour to run the genetic correlations. This time is spent reading in the LD scores (Read reference panel LD Scores for 1189907 SNPs.).

Any help on either of these two issues would be appreciated greatly.

Thank you again for this help.

Best, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 October 2015 01:34 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

No problem. FYI I also made the error that you ran into a few posts ago print out a message that is a little more informative with an explicit suggestion that the --merge-alleles flag may have been missing.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150167709.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150223744.

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-150225658.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]https://github.com/bulik/ldsc/issues/26#issuecomment-150225658

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

hilaryfinucane commented 9 years ago

Hi David,

I'm not sure what's going on here, but it looks like it might be a problem with pandas and not with ldsc, since it's crashing at the line "import pandas as pd". If you just open python and import pandas does that go okay? How about if you just run python ldsc/ldsc.py -h?

Hilary

On Thu, Oct 22, 2015 at 10:09 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

We've updated pandas to the version here

http://pandas.pydata.org/

unfortunately this now produces the error. If I omit the --overlap-annot and --frqfile-chr flags the same error is generated.

Traceback (most recent call last): File "ldsc/ldsc.py", line 12, in import ldscore.parse as ps File "/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/parse.py", line 10, in import pandas as pd File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/init.py", line 44, in from pandas.core.api import * File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/api.py", line 9, in from pandas.core.groupby import Grouper File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 16, in from pandas.core.frame import DataFrame File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 41, in from pandas.core.series import Series File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 2864, in import pandas.tools.plotting as _gfx File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py", line 135, in if _mpl_ge_1_5_0(): File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py", line 130, in _mpl_ge_1_5_0 return (matplotlib.version >= LooseVersion('1.5') File "/usr/local/anaconda/lib/python2.7/distutils/version.py", line 296, in cmp return cmp(self.version, other.version) AttributeError: 'unicode' object has no attribute 'version'

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:36

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Is your version of pandas up to date? Also, can you try the same commands without the --overlap-annot and --frqfile-chr flags? The results won't be interpretable but just to check if the program crashes.

Hilary

On Thu, Oct 22, 2015 at 9:28 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

Thanks for getting back so quickly with this.

I used the data on height from

http://www.broadinstitute.org/collaboration/giant/images/b/b7/GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt.gz

and I downloaded the files from

https://data.broadinstitute.org/alkesgroup/LDSCORE/

I initially ran

python ldsc/munge_sumstats.py --sumstats GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt --merge-alleles w_hm3.snplist --out BMI/BMI --a1-inc

which I seem to have ran correctly. Followed by

python ldsc/ldsc.py --h2 BMI/BMI.sumstats.gz --ref-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/baseline. --w-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/weights. --overlap-annot --frqfile-chr data.broadinstitute.org/alkesgroup/LDSCORE/1000G.mac5eur. --out BMI/BMI_baseline

which generates the error.

The version of LD score regression I have is

LD Score Regression (LDSC)

Version 1.0.0

(C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane

Broad Institute of MIT and Harvard / MIT Department of Mathematics

GNU General Public License v3

Again thank you for your help.

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:13

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Could you please let me know which commands you used and which files, and which versions of each package you have installed?

Best,

Hilary

On Thu, Oct 22, 2015 at 6:03 AM, WilliamDHill notifications@github.com wrote:

Hi Brendan,

Another question I'm afraid. So I've been using LDS regression to perform genetic correlations and I'm about to start with the partitioned heritability. I'm running the test data and script from the tutorial

https://github.com/bulik/ldsc/wiki/Partitioned-Heritability

however I run into an error when I try to run the baseline model

Traceback (most recent call last): File "ldsc/ldsc.py", line 623, in sumstats.estimate_h2(args, log) File "/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/sumstats.py", line 279, in estimate_h2 sumstats = sumstats[ii] File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1785, in getitem return self._getitem_array(key) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1827, in _getitem_array return self.take(indexer, axis=0, convert=False) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 1357, in take convert=True, verify=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 3275, in take axis=axis, allow_dups=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 3162, in reindex_indexer for blk in self.blocks] File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 857, in take_nd allow_fill=True, fill_value=fill_value) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/common.py", line 844, in take_nd func(arr, indexer, out, fill_value) File "pandas/src/generated.pyx", line 5715, in pandas.algos.take_2d_axis1_float64_float64 (pandas/algos.c:106840) File "stringsource", line 614, in View.MemoryView.memoryview_cwrapper (pandas/algos.c:187428) File "stringsource", line 321, in View.MemoryView.memoryview.cinit (pandas/algos.c:184017) ValueError: buffer source array is read-only

I'm not sure what's going on here. Additionally LDS regression should take around 10 mins. However It seems to be taking around an hour to run the genetic correlations. This time is spent reading in the LD scores (Read reference panel LD Scores for 1189907 SNPs.).

Any help on either of these two issues would be appreciated greatly.

Thank you again for this help.

Best, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 October 2015 01:34 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

No problem. FYI I also made the error that you ran into a few posts ago print out a message that is a little more informative with an explicit suggestion that the --merge-alleles flag may have been missing.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150167709.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150223744.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150236346.

WilliamDHill commented 9 years ago

Hi Hilary,

Thanks for your help with this. We're now using version 1.43, which seems to run much more quickly 5 mins as opposed to an hour. However, there is an error message in the output stating

Reading reference panel LD Score from baseline.[1-22] ... ldscore/parse.py:145: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....) x = x.sort(['CHR', 'BP']) # SEs will be wrong unless sorted

This error is present when running the genetic correlations as well. Any help here would be appreciated.

Thanks again for all your help, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 15:19 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Hilary

On Thu, Oct 22, 2015 at 10:09 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

We've updated pandas to the version here

http://pandas.pydata.org/

unfortunately this now produces the error. If I omit the --overlap-annot and --frqfile-chr flags the same error is generated.

Traceback (most recent call last): File "ldsc/ldsc.py", line 12, in import ldscore.parse as ps File "/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/parse.py", line 10, in import pandas as pd File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/init.py", line 44, in from pandas.core.api import * File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/api.py", line 9, in from pandas.core.groupby import Grouper File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 16, in from pandas.core.frame import DataFrame File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 41, in from pandas.core.series import Series File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 2864, in import pandas.tools.plotting as _gfx File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py", line 135, in if _mpl_ge_1_5_0(): File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py", line 130, in _mpl_ge_1_5_0 return (matplotlib.version >= LooseVersion('1.5') File "/usr/local/anaconda/lib/python2.7/distutils/version.py", line 296, in cmp return cmp(self.version, other.version) AttributeError: 'unicode' object has no attribute 'version'

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:36

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Is your version of pandas up to date? Also, can you try the same commands without the --overlap-annot and --frqfile-chr flags? The results won't be interpretable but just to check if the program crashes.

Hilary

On Thu, Oct 22, 2015 at 9:28 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

Thanks for getting back so quickly with this.

I used the data on height from

http://www.broadinstitute.org/collaboration/giant/images/b/b7/GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt.gz

and I downloaded the files from

https://data.broadinstitute.org/alkesgroup/LDSCORE/

I initially ran

python ldsc/munge_sumstats.py --sumstats GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt --merge-alleles w_hm3.snplist --out BMI/BMI --a1-inc

which I seem to have ran correctly. Followed by

python ldsc/ldsc.py --h2 BMI/BMI.sumstats.gz --ref-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/baseline. --w-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/weights. --overlap-annot --frqfile-chr data.broadinstitute.org/alkesgroup/LDSCORE/1000G.mac5eur. --out BMI/BMI_baseline

which generates the error.

The version of LD score regression I have is

LD Score Regression (LDSC)

Version 1.0.0

(C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane

Broad Institute of MIT and Harvard / MIT Department of Mathematics

GNU General Public License v3

Again thank you for your help.

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:13

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Could you please let me know which commands you used and which files, and which versions of each package you have installed?

Best,

Hilary

On Thu, Oct 22, 2015 at 6:03 AM, WilliamDHill notifications@github.com wrote:

Hi Brendan,

Another question I'm afraid. So I've been using LDS regression to perform genetic correlations and I'm about to start with the partitioned heritability. I'm running the test data and script from the tutorial

https://github.com/bulik/ldsc/wiki/Partitioned-Heritability

however I run into an error when I try to run the baseline model

Traceback (most recent call last): File "ldsc/ldsc.py", line 623, in sumstats.estimate_h2(args, log) File "/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/sumstats.py", line 279, in estimate_h2 sumstats = sumstats[ii] File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1785, in getitem return self._getitem_array(key) File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1827, in _getitem_array return self.take(indexer, axis=0, convert=False) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 1357, in take convert=True, verify=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 3275, in take axis=axis, allow_dups=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 3162, in reindex_indexer for blk in self.blocks] File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 857, in take_nd allow_fill=True, fill_value=fill_value) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/common.py", line 844, in take_nd func(arr, indexer, out, fill_value) File "pandas/src/generated.pyx", line 5715, in pandas.algos.take_2d_axis1_float64_float64 (pandas/algos.c:106840) File "stringsource", line 614, in View.MemoryView.memoryview_cwrapper (pandas/algos.c:187428) File "stringsource", line 321, in View.MemoryView.memoryview.cinit (pandas/algos.c:184017) ValueError: buffer source array is read-only

I'm not sure what's going on here. Additionally LDS regression should take around 10 mins. However It seems to be taking around an hour to run the genetic correlations. This time is spent reading in the LD scores (Read reference panel LD Scores for 1189907 SNPs.).

Any help on either of these two issues would be appreciated greatly.

Thank you again for this help.

Best, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 October 2015 01:34 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

No problem. FYI I also made the error that you ran into a few posts ago print out a message that is a little more informative with an explicit suggestion that the --merge-alleles flag may have been missing.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150167709.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150223744.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150236346.

— Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-150238889.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

hilaryfinucane commented 9 years ago

Hi David,

Could you please say again which version of pandas you are using, and for which package you are using version 1.43? (I don't think there is a Pandas 1.43.)

Best,

Hilary

On Fri, Oct 23, 2015 at 7:06 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

Thanks for your help with this. We're now using version 1.43, which seems to run much more quickly 5 mins as opposed to an hour. However, there is an error message in the output stating

Reading reference panel LD Score from baseline.[1-22] ... ldscore/parse.py:145: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....) x = x.sort(['CHR', 'BP']) # SEs will be wrong unless sorted

This error is present when running the genetic correlations as well. Any help here would be appreciated.

Thanks again for all your help, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 15:19

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

I'm not sure what's going on here, but it looks like it might be a problem with pandas and not with ldsc, since it's crashing at the line "import pandas as pd". If you just open python and import pandas does that go okay? How about if you just run python ldsc/ldsc.py -h?

Hilary

On Thu, Oct 22, 2015 at 10:09 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

We've updated pandas to the version here

http://pandas.pydata.org/

unfortunately this now produces the error. If I omit the --overlap-annot and --frqfile-chr flags the same error is generated.

Traceback (most recent call last): File "ldsc/ldsc.py", line 12, in import ldscore.parse as ps File "/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/parse.py", line 10, in import pandas as pd File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/init.py", line 44, in from pandas.core.api import * File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/api.py", line 9, in from pandas.core.groupby import Grouper File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 16, in from pandas.core.frame import DataFrame File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 41, in from pandas.core.series import Series File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 2864, in import pandas.tools.plotting as _gfx File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py", line 135, in if _mpl_ge_1_5_0(): File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py", line 130, in _mpl_ge_1_5_0 return (matplotlib.version >= LooseVersion('1.5') File "/usr/local/anaconda/lib/python2.7/distutils/version.py", line 296, in cmp return cmp(self.version, other.version) AttributeError: 'unicode' object has no attribute 'version'

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:36

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Is your version of pandas up to date? Also, can you try the same commands without the --overlap-annot and --frqfile-chr flags? The results won't be interpretable but just to check if the program crashes.

Hilary

On Thu, Oct 22, 2015 at 9:28 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

Thanks for getting back so quickly with this.

I used the data on height from

http://www.broadinstitute.org/collaboration/giant/images/b/b7/GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt.gz

and I downloaded the files from

https://data.broadinstitute.org/alkesgroup/LDSCORE/

I initially ran

python ldsc/munge_sumstats.py --sumstats GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt --merge-alleles w_hm3.snplist --out BMI/BMI --a1-inc

which I seem to have ran correctly. Followed by

python ldsc/ldsc.py --h2 BMI/BMI.sumstats.gz --ref-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/baseline. --w-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/weights. --overlap-annot --frqfile-chr data.broadinstitute.org/alkesgroup/LDSCORE/1000G.mac5eur . --out BMI/BMI_baseline

which generates the error.

The version of LD score regression I have is

LD Score Regression (LDSC)

Version 1.0.0

(C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane

Broad Institute of MIT and Harvard / MIT Department of Mathematics

GNU General Public License v3

Again thank you for your help.

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:13

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Could you please let me know which commands you used and which files, and which versions of each package you have installed?

Best,

Hilary

On Thu, Oct 22, 2015 at 6:03 AM, WilliamDHill < notifications@github.com> wrote:

Hi Brendan,

Another question I'm afraid. So I've been using LDS regression to perform genetic correlations and I'm about to start with the partitioned heritability. I'm running the test data and script from the tutorial

https://github.com/bulik/ldsc/wiki/Partitioned-Heritability

however I run into an error when I try to run the baseline model

Traceback (most recent call last): File "ldsc/ldsc.py", line 623, in sumstats.estimate_h2(args, log) File

"/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/sumstats.py", line 279, in estimate_h2 sumstats = sumstats[ii] File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1785, in getitem return self._getitem_array(key) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1827, in _getitem_array return self.take(indexer, axis=0, convert=False) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 1357, in take convert=True, verify=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 3275, in take axis=axis, allow_dups=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 3162, in reindex_indexer for blk in self.blocks] File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 857, in take_nd allow_fill=True, fill_value=fill_value) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/common.py", line 844, in take_nd func(arr, indexer, out, fill_value) File "pandas/src/generated.pyx", line 5715, in pandas.algos.take_2d_axis1_float64_float64 (pandas/algos.c:106840) File "stringsource", line 614, in View.MemoryView.memoryview_cwrapper (pandas/algos.c:187428) File "stringsource", line 321, in View.MemoryView.memoryview.cinit (pandas/algos.c:184017) ValueError: buffer source array is read-only

I'm not sure what's going on here. Additionally LDS regression should take around 10 mins. However It seems to be taking around an hour to run the genetic correlations. This time is spent reading in the LD scores (Read reference panel LD Scores for 1189907 SNPs.).

Any help on either of these two issues would be appreciated greatly.

Thank you again for this help.

Best, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 October 2015 01:34 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

No problem. FYI I also made the error that you ran into a few posts ago print out a message that is a little more informative with an explicit suggestion that the --merge-alleles flag may have been missing.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150167709.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150223744.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150236346.

— Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150238889>.

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150238889>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150544749.

WilliamDHill commented 9 years ago

Hi Hilary,

Sorry for the confusion. We are using the latest version of pandas. The 1.43 pertains to Matplotlib, as the error reading "AttributeError: 'unicode' object has no attribute 'version'" has been traced to the version of matplotlib that comes as standard with the latest version of pandas. I'm still having the error stating

And I'm not sure how to tackle it.

Best, David

From: hilaryfinucane notifications@github.com Sent: 23 October 2015 13:39 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Could you please say again which version of pandas you are using, and for which package you are using version 1.43? (I don't think there is a Pandas 1.43.)

Best,

Hilary

On Fri, Oct 23, 2015 at 7:06 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

Thanks for your help with this. We're now using version 1.43, which seems to run much more quickly 5 mins as opposed to an hour. However, there is an error message in the output stating

Reading reference panel LD Score from baseline.[1-22] ... ldscore/parse.py:145: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....) x = x.sort(['CHR', 'BP']) # SEs will be wrong unless sorted

This error is present when running the genetic correlations as well. Any help here would be appreciated.

Thanks again for all your help, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 15:19

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

I'm not sure what's going on here, but it looks like it might be a problem with pandas and not with ldsc, since it's crashing at the line "import pandas as pd". If you just open python and import pandas does that go okay? How about if you just run python ldsc/ldsc.py -h?

Hilary

On Thu, Oct 22, 2015 at 10:09 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

We've updated pandas to the version here

http://pandas.pydata.org/

unfortunately this now produces the error. If I omit the --overlap-annot and --frqfile-chr flags the same error is generated.

Traceback (most recent call last): File "ldsc/ldsc.py", line 12, in import ldscore.parse as ps File "/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/parse.py", line 10, in import pandas as pd File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/init.py", line 44, in from pandas.core.api import * File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/api.py", line 9, in from pandas.core.groupby import Grouper File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 16, in from pandas.core.frame import DataFrame File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 41, in from pandas.core.series import Series File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 2864, in import pandas.tools.plotting as _gfx File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py", line 135, in if _mpl_ge_1_5_0(): File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py", line 130, in _mpl_ge_1_5_0 return (matplotlib.version >= LooseVersion('1.5') File "/usr/local/anaconda/lib/python2.7/distutils/version.py", line 296, in cmp return cmp(self.version, other.version) AttributeError: 'unicode' object has no attribute 'version'

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:36

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Is your version of pandas up to date? Also, can you try the same commands without the --overlap-annot and --frqfile-chr flags? The results won't be interpretable but just to check if the program crashes.

Hilary

On Thu, Oct 22, 2015 at 9:28 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

Thanks for getting back so quickly with this.

I used the data on height from

http://www.broadinstitute.org/collaboration/giant/images/b/b7/GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt.gz

and I downloaded the files from

https://data.broadinstitute.org/alkesgroup/LDSCORE/

I initially ran

python ldsc/munge_sumstats.py --sumstats GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt --merge-alleles w_hm3.snplist --out BMI/BMI --a1-inc

which I seem to have ran correctly. Followed by

python ldsc/ldsc.py --h2 BMI/BMI.sumstats.gz --ref-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/baseline. --w-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/weights. --overlap-annot --frqfile-chr data.broadinstitute.org/alkesgroup/LDSCORE/1000G.mac5eur . --out BMI/BMI_baseline

which generates the error.

The version of LD score regression I have is

LD Score Regression (LDSC)

Version 1.0.0

(C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane

Broad Institute of MIT and Harvard / MIT Department of Mathematics

GNU General Public License v3

Again thank you for your help.

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:13

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Could you please let me know which commands you used and which files, and which versions of each package you have installed?

Best,

Hilary

On Thu, Oct 22, 2015 at 6:03 AM, WilliamDHill < notifications@github.com> wrote:

Hi Brendan,

Another question I'm afraid. So I've been using LDS regression to perform genetic correlations and I'm about to start with the partitioned heritability. I'm running the test data and script from the tutorial

https://github.com/bulik/ldsc/wiki/Partitioned-Heritability

however I run into an error when I try to run the baseline model

Traceback (most recent call last): File "ldsc/ldsc.py", line 623, in sumstats.estimate_h2(args, log) File

"/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/sumstats.py", line 279, in estimate_h2 sumstats = sumstats[ii] File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1785, in getitem return self._getitem_array(key) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1827, in _getitem_array return self.take(indexer, axis=0, convert=False) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/generic.py", line 1357, in take convert=True, verify=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 3275, in take axis=axis, allow_dups=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 3162, in reindex_indexer for blk in self.blocks] File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 857, in take_nd allow_fill=True, fill_value=fill_value) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/common.py", line 844, in take_nd func(arr, indexer, out, fill_value) File "pandas/src/generated.pyx", line 5715, in pandas.algos.take_2d_axis1_float64_float64 (pandas/algos.c:106840) File "stringsource", line 614, in View.MemoryView.memoryview_cwrapper (pandas/algos.c:187428) File "stringsource", line 321, in View.MemoryView.memoryview.cinit (pandas/algos.c:184017) ValueError: buffer source array is read-only

I'm not sure what's going on here. Additionally LDS regression should take around 10 mins. However It seems to be taking around an hour to run the genetic correlations. This time is spent reading in the LD scores (Read reference panel LD Scores for 1189907 SNPs.).

Any help on either of these two issues would be appreciated greatly.

Thank you again for this help.

Best, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 October 2015 01:34 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

No problem. FYI I also made the error that you ran into a few posts ago print out a message that is a little more informative with an explicit suggestion that the --merge-alleles flag may have been missing.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150167709.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150223744.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150236346.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150238889>.

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150238889>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150544749.

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-150560249.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

hilaryfinucane commented 9 years ago

Hi David,

Is this just when you run the analysis, or also when you run python ldsc.py -h? I'm not sure if we're compatible with the most recent version of Pandas. Which version are you running?

Hilary

On Fri, Oct 23, 2015 at 9:25 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

Sorry for the confusion. We are using the latest version of pandas. The 1.43 pertains to Matplotlib, as the error reading "AttributeError: 'unicode' object has no attribute 'version'" has been traced to the version of matplotlib that comes as standard with the latest version of pandas. I'm still having the error stating

Reading reference panel LD Score from baseline.[1-22] ... ldscore/parse.py:145: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....) x = x.sort(['CHR', 'BP']) # SEs will be wrong unless sorted

And I'm not sure how to tackle it.

Best, David

From: hilaryfinucane notifications@github.com Sent: 23 October 2015 13:39 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Could you please say again which version of pandas you are using, and for which package you are using version 1.43? (I don't think there is a Pandas 1.43.)

Best,

Hilary

On Fri, Oct 23, 2015 at 7:06 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

Thanks for your help with this. We're now using version 1.43, which seems to run much more quickly 5 mins as opposed to an hour. However, there is an error message in the output stating

Reading reference panel LD Score from baseline.[1-22] ... ldscore/parse.py:145: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....) x = x.sort(['CHR', 'BP']) # SEs will be wrong unless sorted

This error is present when running the genetic correlations as well. Any

help here would be appreciated.

Thanks again for all your help, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 15:19

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

I'm not sure what's going on here, but it looks like it might be a problem with pandas and not with ldsc, since it's crashing at the line "import pandas as pd". If you just open python and import pandas does that go okay? How about if you just run python ldsc/ldsc.py -h?

Hilary

On Thu, Oct 22, 2015 at 10:09 AM, WilliamDHill <notifications@github.com

wrote:

Hi Hilary,

We've updated pandas to the version here

http://pandas.pydata.org/

unfortunately this now produces the error. If I omit the --overlap-annot and --frqfile-chr flags the same error is generated.

Traceback (most recent call last): File "ldsc/ldsc.py", line 12, in import ldscore.parse as ps File "/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/parse.py", line 10, in import pandas as pd File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/init.py", line 44, in from pandas.core.api import * File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/api.py", line 9, in from pandas.core.groupby import Grouper File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 16, in from pandas.core.frame import DataFrame File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 41, in from pandas.core.series import Series File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 2864, in import pandas.tools.plotting as _gfx File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py",

line 135, in if _mpl_ge_1_5_0(): File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py",

line 130, in _mpl_ge_1_5_0 return (matplotlib.version >= LooseVersion('1.5') File "/usr/local/anaconda/lib/python2.7/distutils/version.py", line 296, in cmp return cmp(self.version, other.version) AttributeError: 'unicode' object has no attribute 'version'

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:36

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Is your version of pandas up to date? Also, can you try the same commands without the --overlap-annot and --frqfile-chr flags? The results won't be interpretable but just to check if the program crashes.

Hilary

On Thu, Oct 22, 2015 at 9:28 AM, WilliamDHill < notifications@github.com> wrote:

Hi Hilary,

Thanks for getting back so quickly with this.

I used the data on height from

http://www.broadinstitute.org/collaboration/giant/images/b/b7/GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt.gz

and I downloaded the files from

https://data.broadinstitute.org/alkesgroup/LDSCORE/

I initially ran

python ldsc/munge_sumstats.py --sumstats GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt --merge-alleles w_hm3.snplist --out BMI/BMI --a1-inc

which I seem to have ran correctly. Followed by

python ldsc/ldsc.py --h2 BMI/BMI.sumstats.gz --ref-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/baseline. --w-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/weights. --overlap-annot --frqfile-chr data.broadinstitute.org/alkesgroup/LDSCORE/1000G.mac5eur . --out BMI/BMI_baseline

which generates the error.

The version of LD score regression I have is

LD Score Regression (LDSC)

Version 1.0.0

(C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane

Broad Institute of MIT and Harvard / MIT Department of Mathematics

GNU General Public License v3

Again thank you for your help.

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:13

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Could you please let me know which commands you used and which files, and which versions of each package you have installed?

Best,

Hilary

On Thu, Oct 22, 2015 at 6:03 AM, WilliamDHill < notifications@github.com> wrote:

Hi Brendan,

Another question I'm afraid. So I've been using LDS regression to perform genetic correlations and I'm about to start with the partitioned heritability. I'm running the test data and script from the tutorial

https://github.com/bulik/ldsc/wiki/Partitioned-Heritability

however I run into an error when I try to run the baseline model

Traceback (most recent call last): File "ldsc/ldsc.py", line 623, in sumstats.estimate_h2(args, log) File

"/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/sumstats.py", line 279, in estimate_h2 sumstats = sumstats[ii] File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1785, in getitem return self._getitem_array(key) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1827, in _getitem_array return self.take(indexer, axis=0, convert=False) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/generic.py",

line 1357, in take convert=True, verify=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 3275, in take axis=axis, allow_dups=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 3162, in reindex_indexer for blk in self.blocks] File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 857, in take_nd allow_fill=True, fill_value=fill_value) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/common.py",

line 844, in take_nd func(arr, indexer, out, fill_value) File "pandas/src/generated.pyx", line 5715, in pandas.algos.take_2d_axis1_float64_float64 (pandas/algos.c:106840) File "stringsource", line 614, in View.MemoryView.memoryview_cwrapper (pandas/algos.c:187428) File "stringsource", line 321, in View.MemoryView.memoryview.cinit (pandas/algos.c:184017) ValueError: buffer source array is read-only

I'm not sure what's going on here. Additionally LDS regression should take around 10 mins. However It seems to be taking around an hour to run the genetic correlations. This time is spent reading in the LD scores (Read reference panel LD Scores for 1189907 SNPs.).

Any help on either of these two issues would be appreciated greatly.

Thank you again for this help.

Best, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 October 2015 01:34 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

No problem. FYI I also made the error that you ran into a few posts ago print out a message that is a little more informative with an explicit suggestion that the --merge-alleles flag may have been missing.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150167709.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150223744.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150236346.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150238889>.

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150238889>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150544749.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150560249>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150570011.

WilliamDHill commented 9 years ago

Hi Hilary,

We've reinstalled lds regression using git clone https://github.com/bulik/ldsc.git along with the Anaconda from

https://www.continuum.io/downloads

Where we took the version for python 2.7 for linux. This seems to have come with version pandas-0.16.2-np19py27_0.

If I try to run python ldsc.py -h

I get the error

Traceback (most recent call last): File "ldsc.py", line 12, in import ldscore.ldscore as ld File "ldscore/ldscore.py", line 3, in import bitarray as ba ImportError: No module named bitarray

If I try to run

python ldsc/munge_sumstats.py --sumstats GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt --merge-alleles w_hm3.snplist --out BMI/BMI --a1-inc

I get the error

Traceback (most recent call last): File "ldsc/munge_sumstats.py", line 3, in import pandas as pd ImportError: No module named pandas

We're not sure what's going on with this.

Thanks again for your help here.

Best, David

From: hilaryfinucane notifications@github.com Sent: 23 October 2015 17:19 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Is this just when you run the analysis, or also when you run python ldsc.py -h? I'm not sure if we're compatible with the most recent version of Pandas. Which version are you running?

Hilary

On Fri, Oct 23, 2015 at 9:25 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

Sorry for the confusion. We are using the latest version of pandas. The 1.43 pertains to Matplotlib, as the error reading "AttributeError: 'unicode' object has no attribute 'version'" has been traced to the version of matplotlib that comes as standard with the latest version of pandas. I'm still having the error stating

Reading reference panel LD Score from baseline.[1-22] ... ldscore/parse.py:145: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....) x = x.sort(['CHR', 'BP']) # SEs will be wrong unless sorted

And I'm not sure how to tackle it.

Best, David

From: hilaryfinucane notifications@github.com Sent: 23 October 2015 13:39 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Could you please say again which version of pandas you are using, and for which package you are using version 1.43? (I don't think there is a Pandas 1.43.)

Best,

Hilary

On Fri, Oct 23, 2015 at 7:06 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

Thanks for your help with this. We're now using version 1.43, which seems to run much more quickly 5 mins as opposed to an hour. However, there is an error message in the output stating

Reading reference panel LD Score from baseline.[1-22] ... ldscore/parse.py:145: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....) x = x.sort(['CHR', 'BP']) # SEs will be wrong unless sorted

This error is present when running the genetic correlations as well. Any

help here would be appreciated.

Thanks again for all your help, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 15:19

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

I'm not sure what's going on here, but it looks like it might be a problem with pandas and not with ldsc, since it's crashing at the line "import pandas as pd". If you just open python and import pandas does that go okay? How about if you just run python ldsc/ldsc.py -h?

Hilary

On Thu, Oct 22, 2015 at 10:09 AM, WilliamDHill <notifications@github.com

wrote:

Hi Hilary,

We've updated pandas to the version here

http://pandas.pydata.org/

unfortunately this now produces the error. If I omit the --overlap-annot and --frqfile-chr flags the same error is generated.

Traceback (most recent call last): File "ldsc/ldsc.py", line 12, in import ldscore.parse as ps File "/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/parse.py", line 10, in import pandas as pd File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/init.py", line 44, in from pandas.core.api import * File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/api.py", line 9, in from pandas.core.groupby import Grouper File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 16, in from pandas.core.frame import DataFrame File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 41, in from pandas.core.series import Series File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 2864, in import pandas.tools.plotting as _gfx File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py",

line 135, in if _mpl_ge_1_5_0(): File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py",

line 130, in _mpl_ge_1_5_0 return (matplotlib.version >= LooseVersion('1.5') File "/usr/local/anaconda/lib/python2.7/distutils/version.py", line 296, in cmp return cmp(self.version, other.version) AttributeError: 'unicode' object has no attribute 'version'

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:36

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Is your version of pandas up to date? Also, can you try the same commands without the --overlap-annot and --frqfile-chr flags? The results won't be interpretable but just to check if the program crashes.

Hilary

On Thu, Oct 22, 2015 at 9:28 AM, WilliamDHill < notifications@github.com> wrote:

Hi Hilary,

Thanks for getting back so quickly with this.

I used the data on height from

http://www.broadinstitute.org/collaboration/giant/images/b/b7/GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt.gz

and I downloaded the files from

https://data.broadinstitute.org/alkesgroup/LDSCORE/

I initially ran

python ldsc/munge_sumstats.py --sumstats GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt --merge-alleles w_hm3.snplist --out BMI/BMI --a1-inc

which I seem to have ran correctly. Followed by

python ldsc/ldsc.py --h2 BMI/BMI.sumstats.gz --ref-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/baseline. --w-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/weights. --overlap-annot --frqfile-chr data.broadinstitute.org/alkesgroup/LDSCORE/1000G.mac5eur . --out BMI/BMI_baseline

which generates the error.

The version of LD score regression I have is

LD Score Regression (LDSC)

Version 1.0.0

(C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane

Broad Institute of MIT and Harvard / MIT Department of Mathematics

GNU General Public License v3

Again thank you for your help.

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:13

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Could you please let me know which commands you used and which files, and which versions of each package you have installed?

Best,

Hilary

On Thu, Oct 22, 2015 at 6:03 AM, WilliamDHill < notifications@github.com> wrote:

Hi Brendan,

Another question I'm afraid. So I've been using LDS regression to perform genetic correlations and I'm about to start with the partitioned heritability. I'm running the test data and script from the tutorial

https://github.com/bulik/ldsc/wiki/Partitioned-Heritability

however I run into an error when I try to run the baseline model

Traceback (most recent call last): File "ldsc/ldsc.py", line 623, in sumstats.estimate_h2(args, log) File

"/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/sumstats.py", line 279, in estimate_h2 sumstats = sumstats[ii] File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1785, in getitem return self._getitem_array(key) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1827, in _getitem_array return self.take(indexer, axis=0, convert=False) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/generic.py",

line 1357, in take convert=True, verify=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 3275, in take axis=axis, allow_dups=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 3162, in reindex_indexer for blk in self.blocks] File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 857, in take_nd allow_fill=True, fill_value=fill_value) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/common.py",

line 844, in take_nd func(arr, indexer, out, fill_value) File "pandas/src/generated.pyx", line 5715, in pandas.algos.take_2d_axis1_float64_float64 (pandas/algos.c:106840) File "stringsource", line 614, in View.MemoryView.memoryview_cwrapper (pandas/algos.c:187428) File "stringsource", line 321, in View.MemoryView.memoryview.cinit (pandas/algos.c:184017) ValueError: buffer source array is read-only

I'm not sure what's going on here. Additionally LDS regression should take around 10 mins. However It seems to be taking around an hour to run the genetic correlations. This time is spent reading in the LD scores (Read reference panel LD Scores for 1189907 SNPs.).

Any help on either of these two issues would be appreciated greatly.

Thank you again for this help.

Best, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 October 2015 01:34 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

No problem. FYI I also made the error that you ran into a few posts ago print out a message that is a little more informative with an explicit suggestion that the --merge-alleles flag may have been missing.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150167709.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150223744.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150236346.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150238889>.

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150238889>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150544749.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150560249>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150570011.

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-150623794.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]https://github.com/bulik/ldsc/issues/26#issuecomment-150623794

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

hilaryfinucane commented 9 years ago

Hi William,

You will have to install the packages listed in Requirements section of the README.

Let me know how this goes,

Hilary

On Tue, Oct 27, 2015 at 11:41 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

We've reinstalled lds regression using git clone https://github.com/bulik/ldsc.git along with the Anaconda from

https://www.continuum.io/downloads

Where we took the version for python 2.7 for linux. This seems to have come with version pandas-0.16.2-np19py27_0.

If I try to run python ldsc.py -h

I get the error

Traceback (most recent call last): File "ldsc.py", line 12, in import ldscore.ldscore as ld File "ldscore/ldscore.py", line 3, in import bitarray as ba ImportError: No module named bitarray

If I try to run

python ldsc/munge_sumstats.py --sumstats GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt --merge-alleles w_hm3.snplist --out BMI/BMI --a1-inc

I get the error

Traceback (most recent call last): File "ldsc/munge_sumstats.py", line 3, in import pandas as pd ImportError: No module named pandas

We're not sure what's going on with this.

Thanks again for your help here.

Best, David

From: hilaryfinucane notifications@github.com Sent: 23 October 2015 17:19 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Is this just when you run the analysis, or also when you run python ldsc.py -h? I'm not sure if we're compatible with the most recent version of Pandas. Which version are you running?

Hilary

On Fri, Oct 23, 2015 at 9:25 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

Sorry for the confusion. We are using the latest version of pandas. The 1.43 pertains to Matplotlib, as the error reading "AttributeError: 'unicode' object has no attribute 'version'" has been traced to the version of matplotlib that comes as standard with the latest version of pandas. I'm still having the error stating

Reading reference panel LD Score from baseline.[1-22] ... ldscore/parse.py:145: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....) x = x.sort(['CHR', 'BP']) # SEs will be wrong unless sorted

And I'm not sure how to tackle it.

Best, David

From: hilaryfinucane notifications@github.com Sent: 23 October 2015 13:39 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Could you please say again which version of pandas you are using, and for which package you are using version 1.43? (I don't think there is a Pandas 1.43.)

Best,

Hilary

On Fri, Oct 23, 2015 at 7:06 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

Thanks for your help with this. We're now using version 1.43, which seems to run much more quickly 5 mins as opposed to an hour. However, there is an error message in the output stating

Reading reference panel LD Score from baseline.[1-22] ... ldscore/parse.py:145: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....) x = x.sort(['CHR', 'BP']) # SEs will be wrong unless sorted

This error is present when running the genetic correlations as well. Any

help here would be appreciated.

Thanks again for all your help, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 15:19

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

I'm not sure what's going on here, but it looks like it might be a problem with pandas and not with ldsc, since it's crashing at the line "import pandas as pd". If you just open python and import pandas does that go okay? How about if you just run python ldsc/ldsc.py -h?

Hilary

On Thu, Oct 22, 2015 at 10:09 AM, WilliamDHill < notifications@github.com

wrote:

Hi Hilary,

We've updated pandas to the version here

http://pandas.pydata.org/

unfortunately this now produces the error. If I omit the --overlap-annot and --frqfile-chr flags the same error is generated.

Traceback (most recent call last): File "ldsc/ldsc.py", line 12, in import ldscore.parse as ps File "/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/parse.py", line 10, in import pandas as pd File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/init.py", line 44, in from pandas.core.api import * File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/api.py", line 9, in from pandas.core.groupby import Grouper File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 16, in from pandas.core.frame import DataFrame File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 41, in from pandas.core.series import Series File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 2864, in import pandas.tools.plotting as _gfx File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py",

line 135, in if _mpl_ge_1_5_0(): File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py",

line 130, in _mpl_ge_1_5_0 return (matplotlib.version >= LooseVersion('1.5') File "/usr/local/anaconda/lib/python2.7/distutils/version.py", line 296, in cmp return cmp(self.version, other.version) AttributeError: 'unicode' object has no attribute 'version'

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:36

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Is your version of pandas up to date? Also, can you try the same commands without the --overlap-annot and --frqfile-chr flags? The results won't be interpretable but just to check if the program crashes.

Hilary

On Thu, Oct 22, 2015 at 9:28 AM, WilliamDHill < notifications@github.com> wrote:

Hi Hilary,

Thanks for getting back so quickly with this.

I used the data on height from

http://www.broadinstitute.org/collaboration/giant/images/b/b7/GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt.gz

and I downloaded the files from

https://data.broadinstitute.org/alkesgroup/LDSCORE/

I initially ran

python ldsc/munge_sumstats.py --sumstats GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt --merge-alleles w_hm3.snplist --out BMI/BMI --a1-inc

which I seem to have ran correctly. Followed by

python ldsc/ldsc.py --h2 BMI/BMI.sumstats.gz --ref-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/baseline. --w-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/weights. --overlap-annot --frqfile-chr data.broadinstitute.org/alkesgroup/LDSCORE/1000G.mac5eur . --out BMI/BMI_baseline

which generates the error.

The version of LD score regression I have is

LD Score Regression (LDSC)

Version 1.0.0

(C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane

Broad Institute of MIT and Harvard / MIT Department of Mathematics

GNU General Public License v3

Again thank you for your help.

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:13

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Could you please let me know which commands you used and which files, and which versions of each package you have installed?

Best,

Hilary

On Thu, Oct 22, 2015 at 6:03 AM, WilliamDHill < notifications@github.com> wrote:

Hi Brendan,

Another question I'm afraid. So I've been using LDS regression to perform genetic correlations and I'm about to start with the partitioned heritability. I'm running the test data and script from the tutorial

https://github.com/bulik/ldsc/wiki/Partitioned-Heritability

however I run into an error when I try to run the baseline model

Traceback (most recent call last): File "ldsc/ldsc.py", line 623, in sumstats.estimate_h2(args, log) File

"/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/sumstats.py", line 279, in estimate_h2 sumstats = sumstats[ii] File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1785, in getitem return self._getitem_array(key) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1827, in _getitem_array return self.take(indexer, axis=0, convert=False) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/generic.py",

line 1357, in take convert=True, verify=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 3275, in take axis=axis, allow_dups=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 3162, in reindex_indexer for blk in self.blocks] File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 857, in take_nd allow_fill=True, fill_value=fill_value) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/common.py",

line 844, in take_nd func(arr, indexer, out, fill_value) File "pandas/src/generated.pyx", line 5715, in pandas.algos.take_2d_axis1_float64_float64 (pandas/algos.c:106840) File "stringsource", line 614, in View.MemoryView.memoryview_cwrapper (pandas/algos.c:187428) File "stringsource", line 321, in View.MemoryView.memoryview.cinit (pandas/algos.c:184017) ValueError: buffer source array is read-only

I'm not sure what's going on here. Additionally LDS regression should take around 10 mins. However It seems to be taking around an hour to run the genetic correlations. This time is spent reading in the LD scores (Read reference panel LD Scores for 1189907 SNPs.).

Any help on either of these two issues would be appreciated greatly.

Thank you again for this help.

Best, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 October 2015 01:34 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

No problem. FYI I also made the error that you ran into a few posts ago print out a message that is a little more informative with an explicit suggestion that the --merge-alleles flag may have been missing.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub <https://github.com/bulik/ldsc/issues/26#issuecomment-150167709 .

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150223744.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150236346.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150238889>.

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150238889>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150544749.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150560249>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150570011.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150623794>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150623794>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150623794>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-151544826.

WilliamDHill commented 9 years ago

Hi Hilary,

Thank you for your patience with this. We've had a few problems with our servers here so my progress on this has been delayed some what. I have everything from the README section installed and LD regression does run. However it produces a warning of

FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....) x = x.sort(['CHR', 'BP'])

Looking at the ANACONDA package I've used to install the dependencies I see that the current version installs Anaconda2-2.4.0-Linux-x86_64.sh

I think this is an updated version but I'm not sure which of the older versions I should be using.

Thank you again for your assistance with this.

Best regards, David

From: hilaryfinucane notifications@github.com Sent: 27 October 2015 15:50 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi William,

You will have to install the packages listed in Requirements section of the README.

Let me know how this goes,

Hilary

On Tue, Oct 27, 2015 at 11:41 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

We've reinstalled lds regression using git clone https://github.com/bulik/ldsc.git along with the Anaconda from

https://www.continuum.io/downloads

Where we took the version for python 2.7 for linux. This seems to have come with version pandas-0.16.2-np19py27_0.

If I try to run python ldsc.py -h

I get the error

Traceback (most recent call last): File "ldsc.py", line 12, in import ldscore.ldscore as ld File "ldscore/ldscore.py", line 3, in import bitarray as ba ImportError: No module named bitarray

If I try to run

python ldsc/munge_sumstats.py --sumstats GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt --merge-alleles w_hm3.snplist --out BMI/BMI --a1-inc

I get the error

Traceback (most recent call last): File "ldsc/munge_sumstats.py", line 3, in import pandas as pd ImportError: No module named pandas

We're not sure what's going on with this.

Thanks again for your help here.

Best, David

From: hilaryfinucane notifications@github.com Sent: 23 October 2015 17:19 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Is this just when you run the analysis, or also when you run python ldsc.py -h? I'm not sure if we're compatible with the most recent version of Pandas. Which version are you running?

Hilary

On Fri, Oct 23, 2015 at 9:25 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

Sorry for the confusion. We are using the latest version of pandas. The 1.43 pertains to Matplotlib, as the error reading "AttributeError: 'unicode' object has no attribute 'version'" has been traced to the version of matplotlib that comes as standard with the latest version of pandas. I'm still having the error stating

Reading reference panel LD Score from baseline.[1-22] ... ldscore/parse.py:145: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....) x = x.sort(['CHR', 'BP']) # SEs will be wrong unless sorted

And I'm not sure how to tackle it.

Best, David

From: hilaryfinucane notifications@github.com Sent: 23 October 2015 13:39 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Could you please say again which version of pandas you are using, and for which package you are using version 1.43? (I don't think there is a Pandas 1.43.)

Best,

Hilary

On Fri, Oct 23, 2015 at 7:06 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

Thanks for your help with this. We're now using version 1.43, which seems to run much more quickly 5 mins as opposed to an hour. However, there is an error message in the output stating

Reading reference panel LD Score from baseline.[1-22] ... ldscore/parse.py:145: FutureWarning: sort(columns=....) is deprecated, use sort_values(by=.....) x = x.sort(['CHR', 'BP']) # SEs will be wrong unless sorted

This error is present when running the genetic correlations as well. Any

help here would be appreciated.

Thanks again for all your help, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 15:19

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

I'm not sure what's going on here, but it looks like it might be a problem with pandas and not with ldsc, since it's crashing at the line "import pandas as pd". If you just open python and import pandas does that go okay? How about if you just run python ldsc/ldsc.py -h?

Hilary

On Thu, Oct 22, 2015 at 10:09 AM, WilliamDHill < notifications@github.com

wrote:

Hi Hilary,

We've updated pandas to the version here

http://pandas.pydata.org/

unfortunately this now produces the error. If I omit the --overlap-annot and --frqfile-chr flags the same error is generated.

Traceback (most recent call last): File "ldsc/ldsc.py", line 12, in import ldscore.parse as ps File "/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/parse.py", line 10, in import pandas as pd File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/init.py", line 44, in from pandas.core.api import * File "/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/api.py", line 9, in from pandas.core.groupby import Grouper File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/groupby.py", line 16, in from pandas.core.frame import DataFrame File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 41, in from pandas.core.series import Series File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/series.py", line 2864, in import pandas.tools.plotting as _gfx File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py",

line 135, in if _mpl_ge_1_5_0(): File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/tools/plotting.py",

line 130, in _mpl_ge_1_5_0 return (matplotlib.version >= LooseVersion('1.5') File "/usr/local/anaconda/lib/python2.7/distutils/version.py", line 296, in cmp return cmp(self.version, other.version) AttributeError: 'unicode' object has no attribute 'version'

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:36

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Is your version of pandas up to date? Also, can you try the same commands without the --overlap-annot and --frqfile-chr flags? The results won't be interpretable but just to check if the program crashes.

Hilary

On Thu, Oct 22, 2015 at 9:28 AM, WilliamDHill < notifications@github.com> wrote:

Hi Hilary,

Thanks for getting back so quickly with this.

I used the data on height from

http://www.broadinstitute.org/collaboration/giant/images/b/b7/GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt.gz

and I downloaded the files from

https://data.broadinstitute.org/alkesgroup/LDSCORE/

I initially ran

python ldsc/munge_sumstats.py --sumstats GIANT_BMI_Speliotes2010_publicrelease_HapMapCeuFreq.txt --merge-alleles w_hm3.snplist --out BMI/BMI --a1-inc

which I seem to have ran correctly. Followed by

python ldsc/ldsc.py --h2 BMI/BMI.sumstats.gz --ref-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/baseline. --w-ld-chr data.broadinstitute.org/alkesgroup/LDSCORE/weights. --overlap-annot --frqfile-chr data.broadinstitute.org/alkesgroup/LDSCORE/1000G.mac5eur . --out BMI/BMI_baseline

which generates the error.

The version of LD score regression I have is

LD Score Regression (LDSC)

Version 1.0.0

(C) 2014-2015 Brendan Bulik-Sullivan and Hilary Finucane

Broad Institute of MIT and Harvard / MIT Department of Mathematics

GNU General Public License v3

Again thank you for your help.

Best, David

From: hilaryfinucane notifications@github.com Sent: 22 October 2015 14:13

To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Could you please let me know which commands you used and which files, and which versions of each package you have installed?

Best,

Hilary

On Thu, Oct 22, 2015 at 6:03 AM, WilliamDHill < notifications@github.com> wrote:

Hi Brendan,

Another question I'm afraid. So I've been using LDS regression to perform genetic correlations and I'm about to start with the partitioned heritability. I'm running the test data and script from the tutorial

https://github.com/bulik/ldsc/wiki/Partitioned-Heritability

however I run into an error when I try to run the baseline model

Traceback (most recent call last): File "ldsc/ldsc.py", line 623, in sumstats.estimate_h2(args, log) File

"/ldsc/WDH_LD_regression/Partitioning/Test_2/ldsc/ldscore/sumstats.py", line 279, in estimate_h2 sumstats = sumstats[ii] File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1785, in getitem return self._getitem_array(key) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/frame.py", line 1827, in _getitem_array return self.take(indexer, axis=0, convert=False) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/generic.py",

line 1357, in take convert=True, verify=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 3275, in take axis=axis, allow_dups=True) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 3162, in reindex_indexer for blk in self.blocks] File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/internals.py",

line 857, in take_nd allow_fill=True, fill_value=fill_value) File

"/usr/local/anaconda/lib/python2.7/site-packages/pandas/core/common.py",

line 844, in take_nd func(arr, indexer, out, fill_value) File "pandas/src/generated.pyx", line 5715, in pandas.algos.take_2d_axis1_float64_float64 (pandas/algos.c:106840) File "stringsource", line 614, in View.MemoryView.memoryview_cwrapper (pandas/algos.c:187428) File "stringsource", line 321, in View.MemoryView.memoryview.cinit (pandas/algos.c:184017) ValueError: buffer source array is read-only

I'm not sure what's going on here. Additionally LDS regression should take around 10 mins. However It seems to be taking around an hour to run the genetic correlations. This time is spent reading in the LD scores (Read reference panel LD Scores for 1189907 SNPs.).

Any help on either of these two issues would be appreciated greatly.

Thank you again for this help.

Best, David

From: Brendan Bulik-Sullivan notifications@github.com Sent: 07 October 2015 01:34 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

No problem. FYI I also made the error that you ran into a few posts ago print out a message that is a little more informative with an explicit suggestion that the --merge-alleles flag may have been missing.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-146044056>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub <https://github.com/bulik/ldsc/issues/26#issuecomment-150167709 .

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150217983>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150223744.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150225658>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150236346.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150238889>.

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150238889>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150544749.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150560249>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-150570011.

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-150623794>.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]< https://github.com/bulik/ldsc/issues/26#issuecomment-150623794>

munge_sumstats.py · Issue #26 · bulik/ldsc Hello, Thank you for making this interesting piece of software I'm keen on using it on my own data sets. I've ran into a spot of bother using the munge_sumstats.py provided (downloaded on the 20/0... Read more...< https://github.com/bulik/ldsc/issues/26#issuecomment-150623794>

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-151544826.

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-151547695.

[https://avatars3.githubusercontent.com/u/11091221?v=3&s=400]https://github.com/bulik/ldsc/issues/26#issuecomment-151547695

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

tpoterba commented 9 years ago

Hi David, Raymond Walters and I have agreed to help with LDSC maintenance as Brendan has left the Broad. The error here is caused by a newer version of the pandas package. You can see which version you have with the command python -c "import pandas; print pandas.__version__".

LDSC works with pandas 0.16, which can be found in Anaconda 2.1: https://repo.continuum.io/archive/index.html

WilliamDHill commented 9 years ago

Many thanks for getting back to me so quickly. I've installed Anaconda-2.1.0-Linux-x86_64.sh but this comes with pandas version 0.14.1, which appears to generate the same error message as before. I can try moving through different versions of Anaconda but I wanted to ask first as to which would work. Is it perhaps a later version that's required?

Best, David

From: tpoterba notifications@github.com Sent: 06 November 2015 15:42 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David, Raymond Walters and I have agreed to help with LDSC maintenance as Brendan has left the Broad. The error here is caused by a newer version of the pandas package. You can see which version you have with the command python -c "import pandas; print pandas.version".

LDSC works with pandas 0.16, which can be found in Anaconda 2.1: https://repo.continuum.io/archive/index.html

Anaconda installer archive Filename Size Last Modified MD5; Anaconda-1.4.0-Linux-x86.sh: 220.5M: 2013-03-09 16:46:53: d5826bb10bb25d2f03639f841ef2f65f: Anaconda-1.4.0-Linux-x86_64.sh Read more...https://repo.continuum.io/archive/index.html

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-154441661.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

WilliamDHill commented 8 years ago

Hi,

I'm looking at the results of an enrichment analysis. One of my categories has a significant p value but the enrichment metric is less than 1. I'm interpreting this as this category contributes less to heritability than its size alone would suggest. However, I'm not sure what meaning can be taken from this, intuitively I would have thought that the p values derived from partitioned heritability would be one sided as its purpose is to elucidate regions of the genome that make the greatest contributions to the total heritability. Could you tell me if there is a way to derive one sided p-values using lds regression and, is this a sensible way to run the analysis? Enrichment tests using p-values typically employ such one sided tests.

Best, David

From: HILL David Sent: 06 November 2015 16:42 To: bulik/ldsc; bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Best, David

From: tpoterba notifications@github.com Sent: 06 November 2015 15:42 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

LDSC works with pandas 0.16, which can be found in Anaconda 2.1: https://repo.continuum.io/archive/index.html

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-154441661.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

hilaryfinucane commented 8 years ago

Hi David,

We did two sided tests because we are interested in repressed regions as well as enriched regions. Our p-values are computed using z-scores, so if you would like a one-sized p-value then you can convert the reported p-value to a z-score and use that in a one-sided test.

Best,

Hilary

On Tue, Dec 1, 2015 at 9:49 AM, WilliamDHill notifications@github.com wrote:

Hi,

I'm looking at the results of an enrichment analysis. One of my categories has a significant p value but the enrichment metric is less than 1. I'm interpreting this as this category contributes less to heritability than its size alone would suggest. However, I'm not sure what meaning can be taken from this, intuitively I would have thought that the p values derived from partitioned heritability would be one sided as its purpose is to elucidate regions of the genome that make the greatest contributions to the total heritability. Could you tell me if there is a way to derive one sided p-values using lds regression and, is this a sensible way to run the analysis? Enrichment tests using p-values typically employ such one sided tests.

Best, David

From: HILL David Sent: 06 November 2015 16:42 To: bulik/ldsc; bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Many thanks for getting back to me so quickly. I've installed Anaconda-2.1.0-Linux-x86_64.sh but this comes with pandas version 0.14.1, which appears to generate the same error message as before. I can try moving through different versions of Anaconda but I wanted to ask first as to which would work. Is it perhaps a later version that's required?

Best, David

From: tpoterba notifications@github.com Sent: 06 November 2015 15:42 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David, Raymond Walters and I have agreed to help with LDSC maintenance as Brendan has left the Broad. The error here is caused by a newer version of the pandas package. You can see which version you have with the command python -c "import pandas; print pandas.version".

LDSC works with pandas 0.16, which can be found in Anaconda 2.1: https://repo.continuum.io/archive/index.html

Anaconda installer archive Filename Size Last Modified MD5; Anaconda-1.4.0-Linux-x86.sh: 220.5M: 2013-03-09 16:46:53: d5826bb10bb25d2f03639f841ef2f65f: Anaconda-1.4.0-Linux-x86_64.sh Read more...https://repo.continuum.io/archive/index.html

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-154441661>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-160988625.

WilliamDHill commented 8 years ago

Many thanks Hilary.

Best, David

From: hilaryfinucane notifications@github.com Sent: 02 December 2015 14:58 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David,

Best,

Hilary

On Tue, Dec 1, 2015 at 9:49 AM, WilliamDHill notifications@github.com wrote:

Hi,

I'm looking at the results of an enrichment analysis. One of my categories has a significant p value but the enrichment metric is less than 1. I'm interpreting this as this category contributes less to heritability than its size alone would suggest. However, I'm not sure what meaning can be taken from this, intuitively I would have thought that the p values derived from partitioned heritability would be one sided as its purpose is to elucidate regions of the genome that make the greatest contributions to the total heritability. Could you tell me if there is a way to derive one sided p-values using lds regression and, is this a sensible way to run the analysis? Enrichment tests using p-values typically employ such one sided tests.

Best, David

From: HILL David Sent: 06 November 2015 16:42 To: bulik/ldsc; bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Many thanks for getting back to me so quickly. I've installed Anaconda-2.1.0-Linux-x86_64.sh but this comes with pandas version 0.14.1, which appears to generate the same error message as before. I can try moving through different versions of Anaconda but I wanted to ask first as to which would work. Is it perhaps a later version that's required?

Best, David

From: tpoterba notifications@github.com Sent: 06 November 2015 15:42 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David, Raymond Walters and I have agreed to help with LDSC maintenance as Brendan has left the Broad. The error here is caused by a newer version of the pandas package. You can see which version you have with the command python -c "import pandas; print pandas.version".

LDSC works with pandas 0.16, which can be found in Anaconda 2.1: https://repo.continuum.io/archive/index.html

Anaconda installer archive Filename Size Last Modified MD5; Anaconda-1.4.0-Linux-x86.sh: 220.5M: 2013-03-09 16:46:53: d5826bb10bb25d2f03639f841ef2f65f: Anaconda-1.4.0-Linux-x86_64.sh Read more...https://repo.continuum.io/archive/index.html

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-154441661>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-160988625.

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-161323769.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

WilliamDHill commented 8 years ago

Hi Hilary,

I have another question regarding the enrichment metric used in the partitioned heritability analysis. The metric is the proportion of heritability captured over the proportion of SNPs in the category. My question is if LD differs between the functional groupings does this bias the enrichment metric as, for example, in regions of high LD there will be fewer effective SNPs?

Many thanks for your help so far.

Best, David

From: HILL David [mailto:s1145284@exseed.ed.ac.uk] Sent: 02 December 2015 15:21 To: bulik/ldsc ldsc@noreply.github.com; bulik/ldsc reply@reply.github.com Cc: WilliamDHill w.d.hill@sms.ed.ac.uk Subject: Re: [ldsc] munge_sumstats.py (#26)

Many thanks Hilary.

Best, David

From: hilaryfinucane Sent: 02 December 2015 14:58 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26) Hi David,

Best,

Hilary

On Tue, Dec 1, 2015 at 9:49 AM, WilliamDHill wrote:

Hi,

I'm looking at the results of an enrichment analysis. One of my categories has a significant p value but the enrichment metric is less than 1. I'm interpreting this as this category contributes less to heritability than its size alone would suggest. However, I'm not sure what meaning can be taken from this, intuitively I would have thought that the p values derived from partitioned heritability would be one sided as its purpose is to elucidate regions of the genome that make the greatest contributions to the total heritability. Could you tell me if there is a way to derive one sided p-values using lds regression and, is this a sensible way to run the analysis? Enrichment tests using p-values typically employ such one sided tests.

Best, David

From: HILL David Sent: 06 November 2015 16:42 To: bulik/ldsc; bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Many thanks for getting back to me so quickly. I've installed Anaconda-2.1.0-Linux-x86_64.sh but this comes with pandas version 0.14.1, which appears to generate the same error message as before. I can try moving through different versions of Anaconda but I wanted to ask first as to which would work. Is it perhaps a later version that's required?

Best, David

From: tpoterba Sent: 06 November 2015 15:42 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David, Raymond Walters and I have agreed to help with LDSC maintenance as Brendan has left the Broad. The error here is caused by a newer version of the pandas package. You can see which version you have with the command python -c "import pandas; print pandas.version".

LDSC works with pandas 0.16, which can be found in Anaconda 2.1: https://repo.continuum.io/archive/index.html

Anaconda installer archive Filename Size Last Modified MD5; Anaconda-1.4.0-Linux-x86.sh: 220.5M: 2013-03-09 16:46:53: d5826bb10bb25d2f03639f841ef2f65f: Anaconda-1.4.0-Linux-x86_64.sh Read more...

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-154441661>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub .

Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-161323769.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

hilaryfinucane commented 8 years ago

Hi William,

It depends on what your baseline for comparison is. We choose equal heritability per SNP (on average) as the baseline. If areas with high LD have lower heritability per SNP, then yes, those regions will be depleted for heritability.

Hilary

On Tue, Dec 22, 2015 at 10:39 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

I have another question regarding the enrichment metric used in the partitioned heritability analysis. The metric is the proportion of heritability captured over the proportion of SNPs in the category. My question is if LD differs between the functional groupings does this bias the enrichment metric as, for example, in regions of high LD there will be fewer effective SNPs?

Many thanks for your help so far.

Best, David

From: HILL David [mailto:s1145284@exseed.ed.ac.uk] Sent: 02 December 2015 15:21 To: bulik/ldsc ldsc@noreply.github.com; bulik/ldsc < reply@reply.github.com> Cc: WilliamDHill w.d.hill@sms.ed.ac.uk Subject: Re: [ldsc] munge_sumstats.py (#26)

Many thanks Hilary.

Best, David

From: hilaryfinucane Sent: 02 December 2015 14:58 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26) Hi David,

We did two sided tests because we are interested in repressed regions as well as enriched regions. Our p-values are computed using z-scores, so if you would like a one-sized p-value then you can convert the reported p-value to a z-score and use that in a one-sided test.

Best,

Hilary

On Tue, Dec 1, 2015 at 9:49 AM, WilliamDHill wrote:

Hi,

I'm looking at the results of an enrichment analysis. One of my categories has a significant p value but the enrichment metric is less than 1. I'm interpreting this as this category contributes less to heritability than its size alone would suggest. However, I'm not sure what meaning can be taken from this, intuitively I would have thought that the p values derived from partitioned heritability would be one sided as its purpose is to elucidate regions of the genome that make the greatest contributions to the total heritability. Could you tell me if there is a way to derive one sided p-values using lds regression and, is this a sensible way to run the analysis? Enrichment tests using p-values typically employ such one sided tests.

Best, David

From: HILL David Sent: 06 November 2015 16:42 To: bulik/ldsc; bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Many thanks for getting back to me so quickly. I've installed Anaconda-2.1.0-Linux-x86_64.sh but this comes with pandas version 0.14.1, which appears to generate the same error message as before. I can try moving through different versions of Anaconda but I wanted to ask first as to which would work. Is it perhaps a later version that's required?

Best, David

From: tpoterba Sent: 06 November 2015 15:42 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David, Raymond Walters and I have agreed to help with LDSC maintenance as Brendan has left the Broad. The error here is caused by a newer version of the pandas package. You can see which version you have with the command python -c "import pandas; print pandas.version".

LDSC works with pandas 0.16, which can be found in Anaconda 2.1: https://repo.continuum.io/archive/index.html

Anaconda installer archive Filename Size Last Modified MD5; Anaconda-1.4.0-Linux-x86.sh: 220.5M: 2013-03-09 16:46:53: d5826bb10bb25d2f03639f841ef2f65f: Anaconda-1.4.0-Linux-x86_64.sh Read more...

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-154441661>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub .

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-161323769>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-166650013.

WilliamDHill commented 8 years ago

Hi Hilary,

I’m trying to get a list of the SNPs that are found in each of the categories in the baseline model, specifically the ones found in conserved regions along with their minor allele frequencies.

If I go into the baseline.2.annot file I can find a set of SNPs for chromosome 2. My question is are the SNPs in each of the categories coded as 1 and if it’s not in the category is it coded as a 0? If not I can’t se how to get a list of the SNPs in each set along with their MAF. Could you tell me how this could be achieved?

Best regards, David

From: hilaryfinucane [mailto:notifications@github.com] Sent: 23 December 2015 16:04 To: bulik/ldsc ldsc@noreply.github.com Cc: WilliamDHill w.d.hill@sms.ed.ac.uk Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi William,

Hilary

On Tue, Dec 22, 2015 at 10:39 AM, WilliamDHill notifications@github.com<mailto:notifications@github.com> wrote:

Hi Hilary,

I have another question regarding the enrichment metric used in the partitioned heritability analysis. The metric is the proportion of heritability captured over the proportion of SNPs in the category. My question is if LD differs between the functional groupings does this bias the enrichment metric as, for example, in regions of high LD there will be fewer effective SNPs?

Many thanks for your help so far.

Best, David

From: HILL David [mailto:s1145284@exseed.ed.ac.uk] Sent: 02 December 2015 15:21 To: bulik/ldsc ldsc@noreply.github.com<mailto:ldsc@noreply.github.com>; bulik/ldsc < reply@reply.github.commailto:reply@reply.github.com> Cc: WilliamDHill w.d.hill@sms.ed.ac.uk<mailto:w.d.hill@sms.ed.ac.uk> Subject: Re: [ldsc] munge_sumstats.py (#26)

Many thanks Hilary.

Best, David

From: hilaryfinucane Sent: 02 December 2015 14:58 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26) Hi David,

We did two sided tests because we are interested in repressed regions as well as enriched regions. Our p-values are computed using z-scores, so if you would like a one-sized p-value then you can convert the reported p-value to a z-score and use that in a one-sided test.

Best,

Hilary

On Tue, Dec 1, 2015 at 9:49 AM, WilliamDHill wrote:

Hi,

I'm looking at the results of an enrichment analysis. One of my categories has a significant p value but the enrichment metric is less than 1. I'm interpreting this as this category contributes less to heritability than its size alone would suggest. However, I'm not sure what meaning can be taken from this, intuitively I would have thought that the p values derived from partitioned heritability would be one sided as its purpose is to elucidate regions of the genome that make the greatest contributions to the total heritability. Could you tell me if there is a way to derive one sided p-values using lds regression and, is this a sensible way to run the analysis? Enrichment tests using p-values typically employ such one sided tests.

Best, David

From: HILL David Sent: 06 November 2015 16:42 To: bulik/ldsc; bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Many thanks for getting back to me so quickly. I've installed Anaconda-2.1.0-Linux-x86_64.sh but this comes with pandas version 0.14.1, which appears to generate the same error message as before. I can try moving through different versions of Anaconda but I wanted to ask first as to which would work. Is it perhaps a later version that's required?

Best, David

From: tpoterba Sent: 06 November 2015 15:42 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David, Raymond Walters and I have agreed to help with LDSC maintenance as Brendan has left the Broad. The error here is caused by a newer version of the pandas package. You can see which version you have with the command python -c "import pandas; print pandas.version".

LDSC works with pandas 0.16, which can be found in Anaconda 2.1: https://repo.continuum.io/archive/index.html

Anaconda installer archive Filename Size Last Modified MD5; Anaconda-1.4.0-Linux-x86.sh: 220.5M: 2013-03-09 16:46:53: d5826bb10bb25d2f03639f841ef2f65f: Anaconda-1.4.0-Linux-x86_64.sh Read more...

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-154441661>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub .

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-161323769>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-166650013.

— Reply to this email directly or view it on GitHubhttps://github.com/bulik/ldsc/issues/26#issuecomment-166931987.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

hilaryfinucane commented 8 years ago

Hi David,

Yes, the SNPs are coded 1 if in the category and 0 otherwise.

Best,

Hilary

On Wed, Jan 27, 2016 at 1:21 PM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

I’m trying to get a list of the SNPs that are found in each of the categories in the baseline model, specifically the ones found in conserved regions along with their minor allele frequencies.

If I go into the baseline.2.annot file I can find a set of SNPs for chromosome 2. My question is are the SNPs in each of the categories coded as 1 and if it’s not in the category is it coded as a 0? If not I can’t se how to get a list of the SNPs in each set along with their MAF. Could you tell me how this could be achieved?

Best regards, David

From: hilaryfinucane [mailto:notifications@github.com] Sent: 23 December 2015 16:04 To: bulik/ldsc ldsc@noreply.github.com Cc: WilliamDHill w.d.hill@sms.ed.ac.uk Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi William,

It depends on what your baseline for comparison is. We choose equal heritability per SNP (on average) as the baseline. If areas with high LD have lower heritability per SNP, then yes, those regions will be depleted for heritability.

Hilary

On Tue, Dec 22, 2015 at 10:39 AM, WilliamDHill <notifications@github.com mailto:notifications@github.com> wrote:

Hi Hilary,

I have another question regarding the enrichment metric used in the partitioned heritability analysis. The metric is the proportion of heritability captured over the proportion of SNPs in the category. My question is if LD differs between the functional groupings does this bias the enrichment metric as, for example, in regions of high LD there will be fewer effective SNPs?

Many thanks for your help so far.

Best, David

From: HILL David [mailto:s1145284@exseed.ed.ac.uk] Sent: 02 December 2015 15:21 To: bulik/ldsc ldsc@noreply.github.com<mailto:ldsc@noreply.github.com>; bulik/ldsc < reply@reply.github.commailto:reply@reply.github.com> Cc: WilliamDHill w.d.hill@sms.ed.ac.uk<mailto:w.d.hill@sms.ed.ac.uk> Subject: Re: [ldsc] munge_sumstats.py (#26)

Many thanks Hilary.

Best, David

From: hilaryfinucane Sent: 02 December 2015 14:58 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26) Hi David,

We did two sided tests because we are interested in repressed regions as well as enriched regions. Our p-values are computed using z-scores, so if you would like a one-sized p-value then you can convert the reported p-value to a z-score and use that in a one-sided test.

Best,

Hilary

On Tue, Dec 1, 2015 at 9:49 AM, WilliamDHill wrote:

Hi,

I'm looking at the results of an enrichment analysis. One of my categories has a significant p value but the enrichment metric is less than 1. I'm interpreting this as this category contributes less to heritability than its size alone would suggest. However, I'm not sure what meaning can be taken from this, intuitively I would have thought that the p values derived from partitioned heritability would be one sided as its purpose is to elucidate regions of the genome that make the greatest contributions to the total heritability. Could you tell me if there is a way to derive one sided p-values using lds regression and, is this a sensible way to run the analysis? Enrichment tests using p-values typically employ such one sided tests.

Best, David

From: HILL David Sent: 06 November 2015 16:42 To: bulik/ldsc; bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Many thanks for getting back to me so quickly. I've installed Anaconda-2.1.0-Linux-x86_64.sh but this comes with pandas version 0.14.1, which appears to generate the same error message as before. I can try moving through different versions of Anaconda but I wanted to ask first as to which would work. Is it perhaps a later version that's required?

Best, David

From: tpoterba Sent: 06 November 2015 15:42 To: bulik/ldsc Cc: WilliamDHill Subject: Re: [ldsc] munge_sumstats.py (#26)

Hi David, Raymond Walters and I have agreed to help with LDSC maintenance as Brendan has left the Broad. The error here is caused by a newer version of the pandas package. You can see which version you have with the command python -c "import pandas; print pandas.version".

LDSC works with pandas 0.16, which can be found in Anaconda 2.1: https://repo.continuum.io/archive/index.html

Anaconda installer archive Filename Size Last Modified MD5; Anaconda-1.4.0-Linux-x86.sh: 220.5M: 2013-03-09 16:46:53: d5826bb10bb25d2f03639f841ef2f65f: Anaconda-1.4.0-Linux-x86_64.sh Read more...

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-154441661>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

Reply to this email directly or view it on GitHub .

Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-161323769>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-166650013.

— Reply to this email directly or view it on GitHub< https://github.com/bulik/ldsc/issues/26#issuecomment-166931987>.

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-175780686.

WilliamDHill commented 8 years ago

Hi Hilary,

I was hoping to ask you another question regarding stratified LD score regression. I have a situation where an annotation (the DHS, for example) does not show significant enrichment but the annotation + 500 bp does. My question is can I interpret this as enrichment for the annotation or not? My thoughts are if these additional regions were not enriched then the enrichment metric would go down despite the amount of h2 captured by the annotation + 500bp increasing. Although I do see the counter argument that the additional regions captured by the 500 bp are not a part of the annotation. Could you shed some light on this?

Additionally, I sent an E-mail asking about how to get the set of SNPs from within each annotation. My E-mail has changed as I’ve since become a member of staff and so I apologise if you sent your response to what is now an incorrect E-mail. I’ve copied in the E-mail I sent you regarding the SNP sets below. Many thanks for your help with this and other questions.

Best regards, David

Hi Hilary,

I’m trying to get a list of the SNPs that are found in each of the categories in the baseline model, specifically the ones found in conserved regions along with their minor allele frequencies.

If I go into the baseline.2.annot file I can find a set of SNPs for chromosome 2. My question is are the SNPs in each of the categories coded as 1 and if it’s not in the category is it coded as a 0? If not I can’t see how to get a list of the SNPs in each set along with their MAF. Could you tell me how this could be achieved?

Best regards, David

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

hilaryfinucane commented 8 years ago

Hi David,

I think that your question is not really about LD score regression, but about the scientific question you'd like to ask. LD score regression can test whether the annotation + 500bp is enriched, and it can test whether the annotation is enriched, and I think you have to decide which of these is an interesting result or not.

RE your older question, yes, the SNPs are coded 1 if in the category and 0 otherwise.

Best,

Hilary

On Wed, Feb 17, 2016 at 5:55 AM, WilliamDHill notifications@github.com wrote:

Hi Hilary,

I was hoping to ask you another question regarding stratified LD score regression. I have a situation where an annotation (the DHS, for example) does not show significant enrichment but the annotation + 500 bp does. My question is can I interpret this as enrichment for the annotation or not? My thoughts are if these additional regions were not enriched then the enrichment metric would go down despite the amount of h2 captured by the annotation + 500bp increasing. Although I do see the counter argument that the additional regions captured by the 500 bp are not a part of the annotation. Could you shed some light on this?

Additionally, I sent an E-mail asking about how to get the set of SNPs from within each annotation. My E-mail has changed as I’ve since become a member of staff and so I apologise if you sent your response to what is now an incorrect E-mail. I’ve copied in the E-mail I sent you regarding the SNP sets below. Many thanks for your help with this and other questions.

Best regards, David

Hi Hilary,

I’m trying to get a list of the SNPs that are found in each of the categories in the baseline model, specifically the ones found in conserved regions along with their minor allele frequencies.

If I go into the baseline.2.annot file I can find a set of SNPs for chromosome 2. My question is are the SNPs in each of the categories coded as 1 and if it’s not in the category is it coded as a 0? If not I can’t see how to get a list of the SNPs in each set along with their MAF. Could you tell me how this could be achieved?

Best regards, David

The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.

— Reply to this email directly or view it on GitHub https://github.com/bulik/ldsc/issues/26#issuecomment-185152738.