eclipse-archived / unide

Eclipse Public License 1.0
29 stars 17 forks source link

$_time precision fixed to milliseconds - limits sampling rate to 1kHz #40

Closed alaendle closed 5 years ago

alaendle commented 5 years ago

The fixed precision of $_time to milliseconds in the specification clearly limits the sampling rate for sensors to 1kHz (for practical reasons and to avoid rounding effects even a much lower sampling rate might be advisable). So I might suggest that this artificial limitation will be removed somehow in upcoming releases of the spec.

For now I have no concrete suggestion how a solution should look like - e.g. just a higher precision; or allowing a floating point number; or add a optional scaling factor. But a very similar discussion can be found here - maybe we can find some useful approaches: https://github.com/elastic/elasticsearch/issues/10005

ameinhardt commented 5 years ago

The limit is due to the possible precision of a number of 32 bit. Assuming a 32 bit value, we could decide for integer or float. If my math is correct, a float has an unsigned precision up to 2^24 ms = ~4.5 hours. A signed integer offset allows ~25 days. Shall we assume 64 bit double for the offset?

alaendle commented 5 years ago

Well personally I prefer integers over floating-point numbers in such cases; mainly because with floating-point numbers you always wonder about the amount of significant digits and you couldn't present decimal fractions like 0.1 exactly. As we have JSON as an interchange format - which for sure doesn't specify any precision, but refers to 64-bit IEEE754 and suggests that interoperability is given by not demanding more precision (rfc7159), we could (or should?) assume that we have 64bit floating-points (with 15 significant numbers) or 64bit integers available. That means that regardless of the choice of a base unit for timestamps we can caught a wide enough time-range.

However the more I think about it, the more I get the feeling that we should not treat the time series in anyway special. Maybe this is even more related to how we think about the Unide format - should it be a format that allows a common and standardized way to communicate with a 3rd-party system (like PPM) that evaluates and interprets (by adding context information) the telegrams. Or should a Unide-telegram stand for its own, and so be useful without any further context? I really like the second opinion, if unide sees itself as a generic format for industrial communication of machine events. I believe that this is the claim that unide makes, but please correct my if I'm wrong (because if this is not the case some other issues I've recorded might get obsolete).

So in accordance with https://github.com/eclipse/unide/issues/44 and https://github.com/eclipse/unide/issues/41 I think (at least for now) that there shouldn't be an "outstanding" series (like "$_time") - and so I would shift the discussion to the more general question how units of series values should be represented in the unide format. @ameinhardt: Please take this comment/input with a grain of salt, because I haven't read through your answers on the other issues until now; I promise I will take a look on them during next days.

alaendle commented 5 years ago

Addendum: We indeed have a sensor in place that yields data at a 800ns interval.

ameinhardt commented 5 years ago

So a general note in the spec should state that, according to rfc7159, we assume 64 bit double (IEEE 754) precision for all numbers. That includes time offsets, which would cover your initial concern, right? Note that JSON Schema doesn't enforce that precision. Since "multipleOf" or other v7 keywords wouldn't be sufficient to validate the double precision of number, validation can't really differentiate from decimal floating point. It's just a textual recommendation.

alaendle commented 5 years ago

I guess we have the same understanding of rfc7159 and its implications. To narrow down the discussion: Would it be a possible solution to just widen the schema and allow floating point values for the time series? My understanding is that this is the direction you (@ameinhardt) are pointing, right? I think this might cover all our current requirements - we will be able to correctly express 800ns sampling rates :smirk:. Also if we keep milliseconds as base unit (which seems to be advisable for compatibility) - we have precision up to attoseconds (which should be more than enough).

ameinhardt commented 5 years ago

With https://github.com/eclipse/unide/commit/54482bfd23d5b677d94efd95f525f60f9a8e5cfc, time changed from Integer to Number, thus allowing sub millisecond offsets. Any objections?

alaendle commented 5 years ago

Perfect match :star2: !