VEuPathDB / lib-eda-subsetting

Provides Java interface to query and provide EDA data and metadata from a database
Apache License 2.0
0 stars 0 forks source link

Produce full set of binary files for megastudy #24

Closed dmgaldi closed 1 year ago

dmgaldi commented 1 year ago

Three issues found:

bobular commented 1 year ago

Curious how floating point values need utf-8 encoding?

dmgaldi commented 1 year ago

This is an optimization for outputting the variable. For filtering, we use a binary floating point representation that can be easily deserialized into a Java float.

If a client requests to output a floating point variable, the tabular output is encoded as utf-8 strings. It's somewhat expensive to convert a Java float into a string, so have the utf-8 string representations pre-computed in another file alongside the binary floating point file.

dmgaldi commented 1 year ago

There's a world where we could have a binary application/octet version of the tabular endpoint so we don't have to worry about utf-8, but that would require all consumers to understand our binary format, whereas right now it's all encapsulated in subsetting service.

dmgaldi commented 1 year ago

This led to discovery of two other issues:

dmgaldi commented 1 year ago

Entity entries with multiple ancestors is currently awaiting @jbrestel to remove the offending studies and reload the megastudy.

dmgaldi commented 1 year ago

Files are in good shape now on yew.