AnIML / schemas

AnIML XML schemas
17 stars 10 forks source link

Specify a new SeriesType for numeric lists and blob data #12

Open laeubi opened 1 year ago

laeubi commented 1 year ago

Currently AnIML specify three kind of series types

https://github.com/AnIML/schemas/blob/96d00c68725c61b7fa861cb18bea2ebd982d5a1e/animl-core.xsd#L606-L608

beside the AutoIncrementedValueSet they are all quite exhaustive in the result in terms of storage space and quite limited, e.g. the EncodedValueSet requires a specific encoding (while it is not completely specified how data is encoded), and IndividualValueSet always requires additional tags, also the XSD can not be used to effectively constrain the used values to the correct data type.

Because of this I'd like to propose two new types of value sets:

  1. NumbericValueSet what is of type xsd:list holding xsd:numeric types, this is much more space efficient (e.g. only a plain list of space delimited numbers) but still can hold different datatypes, so for a series of INT one might simply specify that all values have to be interpreted as such instead of wrapping the actual value.
  2. BlobValueSet what has an attribute encoding that specifies the data stored in it what might be a common type specified in the AnIML specification (e.g. gzip, RLE, delta-compressed-double, ...) or even custom types, which can hold either an element <data> (with base64 encoded data) or <reference> (that holds a reference e.g. to a file next to the animl, database id or similar) to offload the real data bytes and allows the xs:any content type so it can be enhanced with any required custom data.