ccb-hms / GWASCatalogSearchDB

0 stars 0 forks source link

issues with numeric values being characters #14

Open rgentlem opened 9 months ago

rgentlem commented 9 months ago

Hi, In the gwascatalog_associations table

CHR_POS, UPSTREAM_GENE_DISTANCE, DOWNSTREAM_GENE_DISTANCE, P-VALUE, PVALUE_MLOG, RISK_ALLELE_FREQUENCY should all be numeric....

rsgoncalves commented 7 months ago

Agree— it seems that there are non-numeric values in the columns CHR_POS (e.g., "99931297;99938483;99945866;...") as well as in the RISK_ALLELE_FREQUENCY column (e.g., "NR"). The other columns are showing up as REALs on the DB schema.

The datatypes are automatically detected based on the values in each column, that's why (at least) CHR_POS and RISK_ALLELE_FREQUENCY show up as TEXT. We don't actually specify the schema upfront, but it seems like we need a way to do just that, at least for some columns.