bulik / ldsc

LD Score Regression (LDSC)
GNU General Public License v3.0
628 stars 339 forks source link

Error converting summary statistics #131

Open cmfa26 opened 6 years ago

cmfa26 commented 6 years ago

Hi,

rkwalters commented 6 years ago

Hi,

As the error states, this indicates munge_sumstats.py isn't able to find a SNP column in your input summary statistics. There's a few different common issues that could yield this error:

If none of these solve it, it would be helpful to see the full command line you're using to run munge_sumstats.py and an example of the first few rows of your input summary statistics. (Can email those separately if you're more comfortable sharing those individually rather than posting to this issues board.)

Cheers, Raymond

On Aug 31, 2018, at 10:49 AM, cmfa26 notifications@github.com wrote:

Hi, While running munge_sumstats.py, I found a error about: SNP column was not find. I following the tutorial, I did the download of all data (my_data_input and w_hm3.snplist) and software/packages, however I found this error:

Traceback (most recent call last): File "./munge_sumstats.py", line 746, in munge_sumstats(parser.parse_args(), p=True) File "./munge_sumstats.py", line 627, in munge_sumstats raise ValueError('Could not find {C} column.'.format(C=c)) ValueError: Could not find SNP column.

Could help me to figure out how to handle this kind of error?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/bulik/ldsc/issues/131, or mute the thread https://github.com/notifications/unsubscribe-auth/AILEvRNN2Lg1uXH4gGoaUYoMns1z6rRIks5uWU0SgaJpZM4WVWf7.

rkwalters commented 6 years ago

Hi Carolina,

It shouldn't be trying to convert the SNP column to float. Based on the error, munge appears to be encountering "rs187212831" in the column of the input file that it thinks should contain the signed summary statistic (e.g. Z-score or beta or odds ratio). Is there possibly a mismatch between the header of your file and the data?

Cheers, Raymond

On Sep 6, 2018, at 9:15 PM, cmfa26 notifications@github.com wrote:

Hi,

thank you so much for your attention.

I could solve this problem with SNP, but now a new problem has arisen. Could you help me again?

Now the following error appears: Conversion finished at Thu Sep 6 21:04:57 2018 Total time elapsed: 1.49s Traceback (most recent call last): File "./munge_sumstats.py", line 746, in munge_sumstats(parser.parse_args(), p=True) File "./munge_sumstats.py", line 686, in munge_sumstats dat = parse_dat(dat_gen, cname_translation, merge_alleles, log, args) File "./munge_sumstats.py", line 238, in parse_dat for block_num, dat in enumerate(dat_gen): File "/home/cmc329/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 716, in iter yield self.read(self.chunksize) File "/home/cmc329/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 740, in read ret = self._engine.read(nrows) File "/home/cmc329/anaconda2/lib/python2.7/site-packages/pandas/io/parsers.py", line 1187, in read data = self._reader.read(nrows) File "pandas/parser.pyx", line 766, in pandas.parser.TextReader.read (pandas/parser.c:8082) File "pandas/parser.pyx", line 800, in pandas.parser.TextReader._read_low_memory (pandas/parser.c:8538) File "pandas/parser.pyx", line 868, in pandas.parser.TextReader._read_rows (pandas/parser.c:9465) File "pandas/parser.pyx", line 975, in pandas.parser.TextReader._convert_column_data (pandas/parser.c:10858) File "pandas/parser.pyx", line 1054, in pandas.parser.TextReader._convert_tokens (pandas/parser.c:12095) ValueError: could not convert string to float: rs187212831

I tryed convert the rsid column to numeric data, but without success.

Thank you again.

Best regards.

Carolina


De: rkwalters <notifications@github.com mailto:notifications@github.com> Enviado: quinta-feira, 6 de setembro de 2018 16:01 Para: bulik/ldsc Cc: cmfa26; Author Assunto: Re: [bulik/ldsc] Error converting summary statistics (#131)

Hi,

As the error states, this indicates munge_sumstats.py isn't able to find a SNP column in your input summary statistics. There's a few different common issues that could yield this error:

  • The header for a required column isn't one of the recognized values. E.g. for the SNP column, which is expected to include rsids, munge_sumstats.py will recognize "snp", "markername", "snpid", "rs", "rsid", "rs_number", or "rs_numbers" (all case insensitive). If it's not one of those, you can specify what your file is using as a header for that field using the "--snp" argument.

  • The file format of the summary stats isn't supported. Munge_sumstats assumes the input summary stats are whitespace delimited (after decompression) with a header row. Comma-delimited files, or other separators, or tab-delimited files that allow spaces within a field and/or indicate missing values by multiple tabs will not parse correctly.

  • The summary stats are compressed using a format that isn't supported by munge_sumstats. The currently supported options are plain text, gzip (.gz), or bzip2 (.bz2), identified by the filename. If your input summary stats are compressed with a different format (e.g. .zip, as is the case for the example scz/bip files used in the tutorial on the github wiki) you'll need to decompress the file before running munge.

If none of these solve it, it would be helpful to see the full command line you're using to run munge_sumstats.py and an example of the first few rows of your input summary statistics. (Can email those separately if you're more comfortable sharing those individually rather than posting to this issues board.)

Cheers, Raymond

On Aug 31, 2018, at 10:49 AM, cmfa26 <notifications@github.com mailto:notifications@github.com> wrote:

Hi, While running munge_sumstats.py, I found a error about: SNP column was not find. I following the tutorial, I did the download of all data (my_data_input and w_hm3.snplist) and software/packages, however I found this error:

Traceback (most recent call last): File "./munge_sumstats.py", line 746, in munge_sumstats(parser.parse_args(), p=True) File "./munge_sumstats.py", line 627, in munge_sumstats raise ValueError('Could not find {C} column.'.format(C=c)) ValueError: Could not find SNP column.

Could help me to figure out how to handle this kind of error?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <https://github.com/bulik/ldsc/issues/131 https://github.com/bulik/ldsc/issues/131>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AILEvRNN2Lg1uXH4gGoaUYoMns1z6rRIks5uWU0SgaJpZM4WVWf7 https://github.com/notifications/unsubscribe-auth/AILEvRNN2Lg1uXH4gGoaUYoMns1z6rRIks5uWU0SgaJpZM4WVWf7>.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<https://github.com/bulik/ldsc/issues/131#issuecomment-419204979 https://github.com/bulik/ldsc/issues/131#issuecomment-419204979>, or mute the thread<https://github.com/notifications/unsubscribe-auth/Ao4oxfZ8zYzDeN-j4USno3uXsHhyzy6lks5uYXD3gaJpZM4WVWf7 https://github.com/notifications/unsubscribe-auth/Ao4oxfZ8zYzDeN-j4USno3uXsHhyzy6lks5uYXD3gaJpZM4WVWf7>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/bulik/ldsc/issues/131#issuecomment-419289600, or mute the thread https://github.com/notifications/unsubscribe-auth/AILEvclhkQ8GAsimWiWgxvt59vDOVsnlks5uYcikgaJpZM4WVWf7.

cmfa26 commented 6 years ago

Hi Raymond,

Thank you for your answer. I could solved this, I converted the column to float.