DataBiosphere / data-explorer-indexers

BSD 3-Clause "New" or "Revised" License
4 stars 5 forks source link

Indexing fails on Baseline data due to Timestamp field #122

Closed wnojopra closed 5 years ago

wnojopra commented 5 years ago

On BigQuery, the TIMESTAMP type uses the sql defintion of timestamp, in that it is a formatted string looking like 2017-05-23 15:02:02 UTC. See docs for details. Our indexer maps this to a 'date' type on elasticsearch. This expects a unix epoch timestamp, and fails.

Fortunately elasticsearch can accept a date format string, so we should be able to add it and make this work. In my tests with Baseline data, using the format string 'yyyy-MM-dd HH:mm:ss z' seems to work.

One concern is that UKBB data contains several DATETIME columns, and indexing fines. Their datetime fields look like 2010-06-01T15:155:39 .

I would say we don't touch DATETIME, but make it work for TIMESTAMP, and perhaps also for DATE and TIME types.

We may also want to consider switching to the histogram facet for these types - I'll open another issue for that.