JonJala / mtag

Python command line tool for Multi-Trait Analysis of GWAS (MTAG)
GNU General Public License v3.0
172 stars 55 forks source link

float and string error #220

Open zhong156 opened 1 month ago

zhong156 commented 1 month ago

Hello! I hope you are doing well! I am trying to use finngen GWAS summary statistics to run MTAG, but I get this error:

Trait 2: Dropped 64331 SNPs for duplicate values in the "snp_name" column Dropped 975022 SNPs due to strand ambiguity, 6388280 SNPs remain in intersection after merging trait1 Dropped 8179 SNPs due to inconsistent allele pairs from phenotype 2. 6287048 SNPs remain. unsupported operand type(s) for -: 'float' and 'str' Traceback (most recent call last): File "mtag.py", line 1577, in mtag(args) File "mtag.py", line 1343, in mtag DATA_U, DATA, args = load_and_merge_data(args) File "mtag.py", line 357, in load_and_merge_data GWAS_int.loc[snps_to_flip, freq_name + str(p)] = 1. - GWAS_int.loc[snps_to_flip, freq_name + str(p)] File "/home/zhong156/.conda/envs/2024.02-py311/py27_env/lib/python2.7/site-packages/pandas/core/ops.py", line 1583, in wrapper result = safe_na_op(lvalues, rvalues) File "/home/zhong156/.conda/envs/2024.02-py311/py27_env/lib/python2.7/site-packages/pandas/core/ops.py", line 1533, in safe_na_op lambda x: op(x, rvalues)) File "pandas/_libs/algos.pyx", line 690, in pandas._libs.algos.arrmap File "/home/zhong156/.conda/envs/2024.02-py311/py27_env/lib/python2.7/site-packages/pandas/core/ops.py", line 1533, in lambda x: op(x, rvalues)) File "/home/zhong156/.conda/envs/2024.02-py311/py27_env/lib/python2.7/site-packages/pandas/core/ops.py", line 148, in rsub return right - left TypeError: unsupported operand type(s) for -: 'float' and 'str' Analysis terminated from error at Thu Oct 3 10:31:05 2024 Total time elapsed: 6.0m:30.12s

I have tried making the chr, bpos, freq, z, pval columns numeric but I still get this error. I was wondering if you would know what part of the data is causing this issue. Thank you so much!

JonJala commented 1 month ago

Hmm, it looks like it's possibly one of the FRQ columns, based on the line that's throwing the error (mtag.py line 357 it looks like ).

If you are willing to make a tiny tweak to the code, you could add a print of "GWAS_int.dtypes" on the line right before the error and that could tell us what data types have been assigned to each column. Alternatively, you could check your freq columns in your data to confirm that they are all truly numeric.

On Thu, Oct 3, 2024 at 10:47 AM zhong156 @.***> wrote:

Hello! I hope you are doing well! I am trying to use finngen GWAS summary statistics to run MTAG, but I get this error:

Trait 2: Dropped 64331 SNPs for duplicate values in the "snp_name" column Dropped 975022 SNPs due to strand ambiguity, 6388280 SNPs remain in intersection after merging trait1 Dropped 8179 SNPs due to inconsistent allele pairs from phenotype 2. 6287048 SNPs remain. unsupported operand type(s) for -: 'float' and 'str' Traceback (most recent call last): File "mtag.py", line 1577, in mtag(args) File "mtag.py", line 1343, in mtag DATA_U, DATA, args = load_and_merge_data(args) File "mtag.py", line 357, in load_and_merge_data GWAS_int.loc[snps_to_flip, freq_name + str(p)] = 1. - GWAS_int.loc[snps_to_flip, freq_name + str(p)] File "/home/zhong156/.conda/envs/2024.02-py311/py27_env/lib/python2.7/site-packages/pandas/core/ops.py", line 1583, in wrapper result = safe_na_op(lvalues, rvalues) File "/home/zhong156/.conda/envs/2024.02-py311/py27_env/lib/python2.7/site-packages/pandas/core/ops.py", line 1533, in safe_na_op lambda x: op(x, rvalues)) File "pandas/_libs/algos.pyx", line 690, in pandas._libs.algos.arrmap File "/home/zhong156/.conda/envs/2024.02-py311/py27_env/lib/python2.7/site-packages/pandas/core/ops.py", line 1533, in lambda x: op(x, rvalues)) File "/home/zhong156/.conda/envs/2024.02-py311/py27_env/lib/python2.7/site-packages/pandas/core/ops.py", line 148, in rsub return right - left TypeError: unsupported operand type(s) for -: 'float' and 'str' Analysis terminated from error at Thu Oct 3 10:31:05 2024 Total time elapsed: 6.0m:30.12s

I have tried making the chr, bpos, freq, z, pval columns numeric but I still get this error. I was wondering if you would know what part of the data is causing this issue. Thank you so much!

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/220, or unsubscribe https://github.com/notifications/unsubscribe-auth/APIOF56EV7JLF3RQF2CW36TZZVKHNAVCNFSM6AAAAABPJ6VY5CVHI2DSMVQWIX3LMV43ASLTON2WKOZSGU3DIMRVHAYTKOA . You are receiving this because you are subscribed to this thread.Message ID: @.***>

zhong156 commented 1 month ago

Thank you so much for your reply! I added print datatype and the frequency was interpreted as object: SNP object CHR0 int64 BP0 int64 FRQ0 float64 A10 object A20 object Z0 float64 P0 float64 N0 int64 strand_ambig bool CHR1 int64 BP1 float64 FRQ1 object A11 object A21 object Z1 float64 P1 float64 N1 float64 flip_snps1 bool dtype: object

but I read my data into r and it is numeric: sapply(data1, class) snpid chr bpos freq a1 a2 z pval "character" "integer" "integer" "numeric" "character" "character" "numeric" "numeric" n "integer" Do you know why this happens? Thank you so much!

JonJala commented 1 month ago

I'm not sure what the differences are between Python and R in terms of how they read things in and convert to numeric values. R is apparently making some judgment that a particular input maps to some numeric value that Pandas / Python does not feel as confident about. Regardless, it looks like something is maybe up with one or more values in the frequency column in whatever your second data file is. I'd go through that and look for NaNs or string / character values.

On Thu, Oct 3, 2024 at 2:31 PM zhong156 @.***> wrote:

Thank you so much for your reply! I added print datatype and the frequency was interpreted as object: SNP object CHR0 int64 BP0 int64 FRQ0 float64 A10 object A20 object Z0 float64 P0 float64 N0 int64 strand_ambig bool CHR1 int64 BP1 float64 FRQ1 object A11 object A21 object Z1 float64 P1 float64 N1 float64 flip_snps1 bool dtype: object

but I read my data into r and it is numeric: sapply(data1, class) snpid chr bpos freq a1 a2 z pval "character" "integer" "integer" "numeric" "character" "character" "numeric" "numeric" n "integer" Do you know why this happens? Thank you so much!

— Reply to this email directly, view it on GitHub https://github.com/JonJala/mtag/issues/220#issuecomment-2392064103, or unsubscribe https://github.com/notifications/unsubscribe-auth/APIOF53NWA7AWPJREHU26Q3ZZWEO7AVCNFSM6AAAAABPJ6VY5CVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJSGA3DIMJQGM . You are receiving this because you commented.Message ID: @.***>

zhong156 commented 1 month ago

Thank you so much! There were some missing values in the snp column and it worked after I removed them.