bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
626 stars 339 forks source link

Error converting summary statistics #66

Open DCousminer opened 7 years ago

DCousminer commented 7 years ago

Getting the following error. Any help would be appreciated! Thanks.

Call: ./munge_sumstats.py \ --out outfile\ --merge-alleles w_hm3.snplist \ --N 4810.0 \ --sumstats infile.txt \ --ignore OR,OR_95L,OR-95U,MAF,MARKER

Interpreting column names as follows: INFO: INFO score (imputation quality; higher --> better imputation) EAF: Allele frequency EA: Allele 1, interpreted as ref allele for signed sumstat. N: Sample size P: p-Value BETA: [linear/logistic] regression coefficient (0 --> no effect; above 0 --> A1 is trait/risk increasing) SNP: Variant ID (e.g., rs number) NEA: Allele 2, interpreted as non-ref allele for signed sumstat.

Reading list of SNPs for allele merge from w_hm3.snplist Read 1217311 SNPs for allele merge. Reading sumstats from infile.txt into memory 5000000 SNPs at a time. .. done

ERROR converting summary statistics:

Traceback (most recent call last): File "./munge_sumstats.py", line 640, in munge_sumstats dat = parse_dat(dat_gen, cname_translation, merge_alleles, log, args) File "./munge_sumstats.py", line 295, in parse_dat dat = pd.concat(dat_list, axis=0).reset_index(drop=True) File "/mnt/isilon/grant_lab/programs/local/lib64/python2.7/site-packages/pandas-0.17.1-py2.7-linux-x86_64.egg/pandas/tools/merge.py", line 812, in concat copy=copy) File "/mnt/isilon/grant_lab/programs/local/lib64/python2.7/site-packages/pandas-0.17.1-py2.7-linux-x86_64.egg/pandas/tools/merge.py", line 845, in init raise ValueError('No objects to concatenate') ValueError: No objects to concatenate

rkwalters commented 7 years ago

Hello, The times I’ve observed this error message previously the cause has been that the header and data were mismatched, such that one of the columns was effectively empty. Can you check that the number of fields in the header row matches the number of fields per row in the rest of the file? Cheers, Raymond

On Dec 15, 2016, at 1:08 PM, DCousminer notifications@github.com wrote:

Getting the following error. Any help would be appreciated! Thanks.

Call: ./munge_sumstats.py --out outfile --merge-alleles w_hm3.snplist --N 4810.0 --sumstats infile.txt --ignore OR,OR_95L,OR-95U,MAF,MARKER

Interpreting column names as follows: INFO: INFO score (imputation quality; higher --> better imputation) EAF: Allele frequency EA: Allele 1, interpreted as ref allele for signed sumstat. N: Sample size P: p-Value BETA: [linear/logistic] regression coefficient (0 --> no effect; above 0 --> A1 is trait/risk increasing) SNP: Variant ID (e.g., rs number) NEA: Allele 2, interpreted as non-ref allele for signed sumstat.

Reading list of SNPs for allele merge from w_hm3.snplist Read 1217311 SNPs for allele merge. Reading sumstats from infile.txt into memory 5000000 SNPs at a time. .. done

ERROR converting summary statistics:

Traceback (most recent call last): File "./munge_sumstats.py", line 640, in munge_sumstats dat = parse_dat(dat_gen, cname_translation, merge_alleles, log, args) File "./munge_sumstats.py", line 295, in parse_dat dat = pd.concat(dat_list, axis=0).reset_index(drop=True) File "/mnt/isilon/grant_lab/programs/local/lib64/python2.7/site-packages/pandas-0.17.1-py2.7-linux-x86_64.egg/pandas/tools/merge.py", line 812, in concat copy=copy) File "/mnt/isilon/grant_lab/programs/local/lib64/python2.7/site-packages/pandas-0.17.1-py2.7-linux-x86_64.egg/pandas/tools/merge.py", line 845, in init raise ValueError('No objects to concatenate') ValueError: No objects to concatenate

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bulik/ldsc/issues/66, or mute the thread https://github.com/notifications/unsubscribe-auth/AILEvUtwLqTXegVi7pKJmV78NK9JcjARks5rIYIdgaJpZM4LOZUM.

DCousminer commented 7 years ago

Thanks! Now another problem when I try to run the genetic correlation analysis:

Call: ./ldsc.py \ --ref-ld-chr ../ldsc_old/eur_w_ld_chr/ \ --out Tr1_Tr2 \ --rg trait1.sumstats.gz,trait2.sumstats.gz \ --w-ld-chr ../ldsc_old/eur_w_ld_chr/

Beginning analysis at Thu Dec 15 16:51:50 2016 Reading summary statistics from trait1.sumstats.gz ... Read summary statistics for 1002410 SNPs. Reading reference panel LD Score from ../ldsc_old/eur_w_ld_chr/[1-22] ... Read reference panel LD Scores for 1293150 SNPs. Removing partitioned LD Scores with zero variance. Reading regression weight LD Score from ../ldsc_old/eur_w_ld_chr/[1-22] ... Read regression weight LD Scores for 1293150 SNPs. After merging with reference panel LD, 1002405 SNPs remain. After merging with regression SNP LD, 1002405 SNPs remain. Computing rg for phenotype 2/2 Reading summary statistics from trait2.sumstats.gz ... Read summary statistics for 1217311 SNPs. After merging with summary statistics, 1002405 SNPs remain. 1002377 SNPs with valid alleles. ERROR computing rg for phenotype 2/2, from file trait2.sumstats.gz. Traceback (most recent call last): File "/mnt/isilon/grant_lab/ldsc/ldscore/sumstats.py", line 349, in estimate_rg rghat = _rg(loop, args, log, M_annot, ref_ld_cnames, w_ld_cname, i) File "/mnt/isilon/grant_lab/ldsc/ldscore/sumstats.py", line 476, in _rg intercept_gencov=intercepts[2], n_blocks=n_blocks, twostep=args.two_step) File "/mnt/isilon/grant_lab/ldsc/ldscore/regressions.py", line 705, in init np.multiply(hsq1.tot_delete_values, hsq2.tot_delete_values)) FloatingPointError: invalid value encountered in sqrt

rkwalters commented 7 years ago

(To anyone else watching this thread, it looks like the last email notification got truncated but the rg log posted properly at Github, https://github.com/bulik/ldsc/issues/66)

The error here indicates that you have negative estimates of h2 in one of your traits (specifically, in the block jackknifed estimates used for estimating SEs). This usually means that h2 is near zero for one or both of the traits, and genetic correlation can't be reliably estimated. This is why we generally recommend limiting rg analyses to pairs of traits that have relatively strong (in terms of significance) univariate estimates for h2.

Cheers, Raymond

DCousminer commented 7 years ago

Thank you again for your prompt response!

I don't know if that's the problem-- I also uploaded the same results file to LD-hub and it passed the h2 stage with a reasonable estimate:

Total Observed scale h2: 0.3429 (0.1484) Lambda GC: 1.0864 Mean Chi^2: 1.1248 Intercept: 1.0917 (0.0074) Ratio: 0.7349 (0.0592)

However, then the rg results all came back with NAs. Could something else be going on?

Many thanks, Diana

rkwalters commented 7 years ago

Hi Diana, Those h2 results do look better than I anticipated, though the SE remains quite large (at one point the recommendation was z score >4 for h2, but LD hub has been going more lenient).

Couple things I would check in your dataset: 1) How many variants are being used in the h2 and the analysis? If substantially fewer are entering the rg analysis for whatever reason, that could reduce stability. 2) Does your input GWAS have a couple loci with very strong effects? Can cause increased variability in the jackknife. Can be addressed by omitting those loci from LD regression.

Cheers, Raymond

On Dec 15, 2016 8:40 PM, "DCousminer" notifications@github.com wrote:

Thank you again for your prompt response!

I don't know if that's the problem-- I also uploaded the same results file to LD-hub and it passed the h2 stage with a reasonable estimate:

Total Observed scale h2: 0.3429 (0.1484) Lambda GC: 1.0864 Mean Chi^2: 1.1248 Intercept: 1.0917 (0.0074) Ratio: 0.7349 (0.0592)

However, then the rg results all came back with NAs. Could something else be going on?

Many thanks, Diana

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/bulik/ldsc/issues/66#issuecomment-267495547, or mute the thread https://github.com/notifications/unsubscribe-auth/AILEvayZ74EKqoSn4lh7RjiFlrtk4OOoks5rIewegaJpZM4LOZUM .

ZhaotongL commented 5 years ago

@DCousminer @rkwalters Hi, I met the same problem as you which is "ValueError: No objects to concatenate". How did you fix it before? p.s. I checked the header and the rest of the file, and they had the same number of columns. Is there any possibility causing this error?

choishingwan commented 4 years ago

I have stumble upon the same problem. Upon further inspection, it's because my summary statistic file contain a N_Case column that's filled with NaN and we only used the N column. This cause LDSC to remove all rows from our summary statistic even though we didn't use --N-cas-col as the N_Case column was automatically detected. Once we removed that column, LDSC runs without the error.

fry3682665 commented 3 years ago

我偶然发现了同样的问题。经过进一步检查,这是因为我的摘要统计文件包含一个用NaN填充的N_Case列,而我们仅使用了N列。这将导致LDSC从我们的摘要统计信息中删除所有行,即使我们没有使用它, --N-cas-col因为N_Case列是自动检测到的。删除该列后,LDSC将运行而不会出现错误。

hi, I would like to know how you managed to delete 1 column of data from a large amount of GWAS Summary data. What method or software did you use

choishingwan commented 3 years ago

you can use awk or R

fry3682665 commented 3 years ago

you can use awk or R

The data has been flashing since awK was used, and I'm not sure if I deleted it successfully, because it's not over yet

choishingwan commented 3 years ago

You need to show the script in order for us to know what’s going on. From your description, it seems like you forgot to pipe the output of awk into a file

e.g.

awk ‘{print $1}’ file > new_file

From: fry3682665 notifications@github.com Reply-To: bulik/ldsc reply@reply.github.com Date: Tuesday, December 15, 2020 at 4:31 PM To: bulik/ldsc ldsc@noreply.github.com Cc: Shing Wan Choi choishingwan@gmail.com, Comment comment@noreply.github.com Subject: Re: [bulik/ldsc] Error converting summary statistics (#66)

you can use awk or R

The data has been flashing since awK was used, and I'm not sure if I deleted it successfully, because it's not over yet

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

fry3682665 commented 3 years ago

You need to show the script in order for us to know what’s going on. From your description, it seems like you forgot to pipe the output of awk into a file e.g. awk ‘{print $1}’ file > new_file From: fry3682665 notifications@github.com Reply-To: bulik/ldsc reply@reply.github.com Date: Tuesday, December 15, 2020 at 4:31 PM To: bulik/ldsc ldsc@noreply.github.com Cc: Shing Wan Choi choishingwan@gmail.com, Comment comment@noreply.github.com Subject: Re: [bulik/ldsc] Error converting summary statistics (#66) you can use awk or R The data has been flashing since awK was used, and I'm not sure if I deleted it successfully, because it's not over yet — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

image There is a problem with the edited data. What is the problem

choishingwan commented 3 years ago

LDSC does not accept CSV as an input.

fry3682665 commented 3 years ago

LDSC does not accept CSV as an input.

Why do you say it's a CSV file instead of a TXT file awk -F'/t' '{print $1,$2,$3,$4,$5,$8,$9,$10,$11,$12}' gzss.txt > gzssxx.txt Is this command correct

fry3682665 commented 3 years ago

LDSC does not accept CSV as an input.

image image After modifying the data, there are still errors in converting the format

fry3682665 commented 3 years ago

@DCousminer @rkwalters 嗨,我遇到了与您相同的问题,即“ ValueError:没有要串联的对象”。您之前是如何修复的? ps我检查了标头和文件的其余部分,它们具有相同的列数。是否有可能导致此错误?

Hello, I also met the same problem, I want to know if you have found a solution, I hope to get your advice

jaamarks commented 3 years ago

I was experiencing this error and found a solution. The issue stemmed from the SNP names in my GWAS results. I reformatted the SNP names and the problem was resolved.

error causing GWAS results ``` MarkerName chr position Allele1 Allele2 Effect P-value rs116587930:727841:G:A 1 727841 A G 0.0880 0.4824 rs4951859:729679:C:G 1 729679 C G 0.0504 0.3896 rs148120343:730087:T:C 1 730087 T C -0.0882 0.4808 rs142557973:731718:T:C 1 731718 T C -0.0791 0.2761 rs141242758:734349:T:C 1 734349 T C -0.0722 0.326 rs79010578:736289:T:A 1 736289 A T 0.0170 0.8169 ⋮ ```
reformatted GWAS results ``` MarkerName chr position Allele1 Allele2 Effect P-value rs116587930 1 727841 A G 0.0880 0.4824 rs4951859 1 729679 C G 0.0504 0.3896 rs148120343 1 730087 T C -0.0882 0.4808 rs142557973 1 731718 T C -0.0791 0.2761 rs141242758 1 734349 T C -0.0722 0.326 rs79010578 1 736289 A T 0.0170 0.8169 ⋮ ```
CaoLuYao98 commented 2 years ago

Hi~I met the same error. How did you reformat the SNP names? use R or something else?

fry3682665 commented 2 years ago

用R

---Original--- From: @.> Date: Tue, Nov 23, 2021 20:07 PM To: @.>; Cc: @.**@.>; Subject: Re: [bulik/ldsc] Error converting summary statistics (#66)

Hi~I met the same error. How did you reformat the SNP names? use R or something else?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

lizisiyuanzhang commented 10 months ago

When I was running this step, it stuck for more than an hour, and finally showed Teminated, did not generate.gz file, please advise Reading sumstats from pgc.bip.full.2012-04.txt into memory 5000000 SNPs at a time. Terminated