JonJala / mtag

Python command line tool for Multi-Trait Analysis of GWAS (MTAG)
GNU General Public License v3.0
170 stars 54 forks source link

ParserError: Error tokenizing data. C error: Expected 10 fields in line 21, saw 11 #69

Closed YaoXueming closed 5 years ago

YaoXueming commented 5 years ago

Hi,Hui,

When I run the mtag, it came out a error. Here is the log file. all-mt-2.log

And I less the second trait. But it seems there is no problem with this line. [jli01@ln3%tianhe cell]$ sed -n '21,1p' scz-mt-2 10 100017453 T G rs1983864 0.0125 0.3847 0.0109004 65967 0.341

Then I search this on google, and It showed that I need to use the commend --error_bad_lines=False, I try this in MTAG, it does not work. Can you give me some advice?

Best wishes,

Xueming Yao

huilisabrina commented 5 years ago

Hi @YaoXueming ,

You might want to check and remove any leading or trailing spaces in your data. The error in your log file indicates there is a data format problem. Our software assumes that the input data has been reasonably QC-ed.

I'm not sure about the flag --error_bad_lines you mentioned, but it's definitely not one of the options in the mtag package. To see a list of the mtag flags, you can run python mtag.py --help.

Best, Hui

YaoXueming commented 5 years ago

HI,Hui,

It seems not the problem of any leading or trailing spaces. I used the commend $ cat input.txt | sed 's/^[ \t]//;s/[ \t]$//' > output.txt to delete the spaces of my input data. And I used the -e option of cat, the trailing spaces can be noticed easily(the $ symbol indicates end of line) (http://www.theunixschool.com/2012/12/howto-remove-leading-trailing-spaces.html)

$ cat -e bip-mt-2rev > bip-e $ less bip-e CHR BP A1 A2 SNP SE P beta n freq$ 10 10000018 A G rs6602381 0.0155 0.03669 -0.0322959 41653 0.4662$ 10 100000625 A G rs7899632 0.0148 0.008551 -0.0388968 41653 0.4483$ 10 100000645 A C rs61875309 0.0182 0.1254 0.0278972 41653 0.2247$ 10 100003242 T G rs12258651 0.0229 0.7507 0.00730327 41653 0.1421$ 10 100003304 A G rs72828461 0.0449 0.2556 0.0509973 41653 0.0358$ 10 100003785 T C rs1359508 0.0154 0.1328 0.0231988 41653 0.3231$ 10 10000514 T C rs6602382 0.0152 0.01512 -0.036996 41653 0.4662$ 10 10000797 A G rs139197039 0.0381 0.9648 0.00169856 41653 0.0358$ 10 100010186 A G rs4919190 0.0148 0.006511 0.040201 41653 0.5507$ 10 100010578 A T rs3750596 0.0148 0.008799 -0.0388032 41653 0.4483$ 10 100011120 A T rs117258652 0.0779 0.9507 0.00479847 41653 0.0149$ 10 10001114 A C rs191472867 0.076 0.929 0.00679685 41653 0.006$ 10 100012739 A G rs737656 0.0154 0.1356 -0.0230025 41653 0.672$ 10 100012890 A G rs737657 0.0154 0.1375 -0.0229002 41653 0.671$ 10 100013244 A C rs3750599 0.0154 0.1305 0.0232965 41653 0.3231$ 10 100013977 A T rs878178 0.0155 0.1567 0.0218985 41653 0.3211$ 10 100016313 A T rs1983866 0.0157 0.004109 -0.0451961 41653 0.3082$ 10 10001650 C G rs149227564 0.0186 0.1779 -0.0251024 41653 0.3668$ 10 100017453 T G rs1983864 0.0155 0.005484 -0.0430954 41653 0.341$ 10 100020572 T G rs11189526 0.0155 0.1896 0.0203025 41653 0.3221$

and rerun the MTAG, It still did not work and the error is same as I mentioned previously. It really confused me. Any advice?

Best, Xueming Yao

huilisabrina commented 5 years ago

Hi @YaoXueming ,

Could you send me or share your input data on Google drive/Dropbox (if there's no data restriction issues)? I just need to take a closer look to understand what the problem is. I still can't replicate your error based on the descriptions you provided so far.

Thanks, Hui

YaoXueming commented 5 years ago

Hi,Hui,

Wow, you are so nice, and here is the link of my four input files. Wish you can figure out the error. Thank you so much!

  1. https://drive.google.com/open?id=1ZztPR_IS7Aa411jXTbb81CXAo1fkZcip
  2. https://drive.google.com/open?id=1xb4QEt8MqixX2urSUfgoMavv_gnXljA6
  3. https://drive.google.com/open?id=1gsp7Z-g6yyRJemAoUsdDzE-PlnNKAAae
  4. https://drive.google.com/open?id=1P8pIZEE0GaUan1kn4Ow7bkLvYe3TfNtC

Best wishes,

Xueming Yao

YaoXueming commented 5 years ago

Hey @huilisabrina ,

I found what the problem is. It is because when I merge the data, I used sed 's/:/ /g' which deleted the SNP that id likes chr10:10003392, and it turned out 11 fields in some lines. I am thankful for your warm heart. But when I uesd the new input files to run mtag.It showed the following error. And I used the --std_betas and deleted the line of freq=0&1 and still showed the error.

File "mtag.py", line 590, in extract_gwas_sumstats Ns = 1 / np.square(SEs) FloatingPointError: divide by zero encountered in true_divide

Any advice?

Best wishes, Xueming Yao

paturley commented 5 years ago

Do you have any SNPs where the standard errors are coded as exactly zero?

On Thu, May 16, 2019 at 8:51 AM YaoXueming notifications@github.com wrote:

Hey @huilisabrina https://github.com/huilisabrina

I found what the problem is. It is because when I merge the data, I used sed 's/:/ /g' which deleted the SNP that id likes chr10:10003392, and it turned out 11 fields in some lines. I am thankful for your warm heart. But when I uesd the new input files to run mtag.It showed the following error. And I used the --std_betas and deleted the line of freq=0&1 and still showed the error.

File "mtag.py", line 590, in extract_gwas_sumstats Ns = 1 / np.square(SEs) FloatingPointError: divide by zero encountered in true_divide

Any advice?

Best wishes, Xueming Yao

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/omeed-maghzian/mtag/issues/69?email_source=notifications&email_token=AFBUB5PLVAF4OUBAWMUDXXLPVV7JPA5CNFSM4HMH3Z3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVSH4JQ#issuecomment-493125158, or mute the thread https://github.com/notifications/unsubscribe-auth/AFBUB5OMD47LUYKJJZAL6N3PVV7JPANCNFSM4HMH3Z3A .

YaoXueming commented 5 years ago

Hi,

well, I found I have the frequency of some SNPs equal 0,and I deleted them, and ran the MTAG again, it solved the error and came out a new error "LinAlgError: Array must not contain infs or NaNs ", and I found a issue mentioned the same problem, but I uesd the se and beta to do the analysis rather than Z scores. It the same problem of previous issue? Is there something wrong with the column of beta or se in my input files? here is the log file. all-mt-2.1.log

thanks, Xueming Yao

paturley commented 5 years ago

It sounds like your data may include values that aren't real numbers or are outside the range of feasible values. Your data should be carefully QC'ed before passing it into MTAG.

Best, Patrick

On Fri, May 17, 2019 at 8:46 AM YaoXueming notifications@github.com wrote:

Hi,

well, I found I have the frequency of some SNPs equal 0,and I deleted them, and ran the MTAG again, it solved the error and came out a new error "LinAlgError: Array must not contain infs or NaNs ", and I found a issue mentioned the same problem, but I uesd the se and beta to do the analysis rather than Z scores. It the same problem of previous issue? Is there something wrong with the column of beta or se in my input files? here is the log file. all-mt-2.1.log https://github.com/omeed-maghzian/mtag/files/3192179/all-mt-2.1.log

thanks, Xueming Yao

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/omeed-maghzian/mtag/issues/69?email_source=notifications&email_token=AFBUB5OUHQKKOWTZJZTKJQDPV3HMTA5CNFSM4HMH3Z3KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVVD3WY#issuecomment-493501915, or mute the thread https://github.com/notifications/unsubscribe-auth/AFBUB5JFZG7GLH3BNRIM4FLPV3HMTANCNFSM4HMH3Z3A .

YaoXueming commented 5 years ago

Hi,Patrick,

It proved that you are right, I got the results finally,thank you so much!

Best wishes, Xueming Yao