flux-framework / flux-core

core services for the Flux resource management framework
GNU Lesser General Public License v3.0
167 stars 50 forks source link

Flux incorrectly using double to store timestamps #4574

Open morrone opened 2 years ago

morrone commented 2 years ago

In src/common/libeventlog, the code is using a C double to store timestamps. A double is not capable of representing all microsecond resolution timestamps values, let alone store the nanosecond resolution timestamp from struct timespec. eventlog.c:get_timestamp_now() is using clock_gettime() to retrieve nanosecond-resolution time stamps and converting that to a double. The resolution becomes basically variable as the nearest usable double value is selected to represent the precise value from the struct timespec.

Flux needs to follow the example of the standard C libraries and keep timestamps in a timestamp data structure. I think we should just stick with struct timespec and nanosecond resolution. We probably don't want to use microsecond resolution and then have to go through the pain of another conversion when someone wants nanosecond in the future.

morrone commented 2 years ago

I have not checked the rest of the flux code. We should probably check that timestamps are not incorrectly stored outside of libeventlog() as well.

garlick commented 2 years ago

FWIW libeventlog is implementing this specification which requires only second level precision. I think we were assuming that the convenience of parsing a single JSON scalar value rather than a "time object" or whatever was friendlier here, and clearly we didn't anticipate the need for high precision.

What motivates the need for higher precision?

chu11 commented 2 years ago

I have not checked the rest of the flux code. We should probably check that timestamps are not incorrectly stored outside of libeventlog() as well.

A large swath of timestamps in flux are based on the values stored in the eventlog, so they are basically doubles everywhere.

the ultimate question, how do we store in the eventlog? And backwards compatibility to current format. Doing only the most casual googling, the suggestion seems to be to store timestamp in ISO 8601 format w/ nanoseconds. (I haven't figured out exactly how to do this in C, but I figure one of the time formatting functions can do this). I suppose a simple "if not in iso format, try double" would be sufficient for backwards compatibility.

morrone commented 2 years ago

We are not talking about changing RFC18 as far as I can tell, nor are we talking about reducing the convenience by having more than one JSON scalar.

In JSON, a "number" type is represented as an arbitrarily long string of digits, a decimal point, and then an arbitrarily long string of digits (to over simplify).

The flux code already produces json output that has seven digits after the decimal point some of the time. So that is already more digits than would be needed to implement the C struct timeval (microseconds). So even if you choose microsecond precision rather than nanosecond, you would have no change needed to the json presentation of the value. If you go with nanosecond, there would just be 9 digits after the decimal point in the json, so again no real change to the json format, just more digits in the same existing field.

Here is an example:

{"timestamp":1663264122.3224404,"name":"submit","context":{"userid":8126,"urgency":16,"flags":0,"version":1}}
{"timestamp":1663264122.3368895,"name":"jobspec-update","context":{"attributes.system.duration":1800.0}}
{"timestamp":1663264122.3370321,"name":"validate"}
{"timestamp":1663264122.3503926,"name":"depend"}
{"timestamp":1663264122.3504636,"name":"priority","context":{"priority":10054881}}
{"timestamp":1663264122.3923786,"name":"alloc","context":{"annotations":{"sched":{"queue":"default"}}}}
{"timestamp":1663264122.3926599,"name":"prolog-start","context":{"description":"job-manager.prolog"}}
{"timestamp":1663264123.2060091,"name":"prolog-finish","context":{"description":"job-manager.prolog","status":0}}
{"timestamp":1663264123.2176478,"name":"start"}
{"timestamp":1663264623.477493,"name":"finish","context":{"status":0}}
{"timestamp":1663264623.4778001,"name":"epilog-start","context":{"description":"job-manager.epilog"}}
{"timestamp":1663264623.5022402,"name":"release","context":{"ranks":"all","final":true}}
{"timestamp":1663264646.9735,"name":"epilog-finish","context":{"description":"job-manager.epilog","status":0}}
{"timestamp":1663264646.9739602,"name":"free"}
{"timestamp":1663264646.9740002,"name":"clean"}

I understand that one might look at a JSON number like this:

1663264646.9740002

and think to themselves "that is a double". But that would be incorrect; that is a decimal number. How we choose to represent that in the C language is a completely different question.

the ultimate question, how do we store in the eventlog? And backwards compatibility to current format. Doing only the most casual googling, the suggestion seems to be to store timestamp in ISO 8601 format w/ nanoseconds. (I haven't figured out exactly how to do this in C, but I figure one of the time formatting functions can do this). I suppose a simple "if not in iso format, try double" would be sufficient for backwards compatibility.

I think that question about the format of the representation of timestamps in the eventlog is somewhat orthogonal to the one in this ticket. The ticket really is just about incorrectly using a C-lang double to represent a timestamp in the C implementation, and in the C API. One could imaging that the code to parse and generate ISO8601 could be written to represent the timestamp in C as a double, but that would not address the root issue raised in this ticket.

The use of the C language double is the root issue here, not he way it appears in JSON. It would be nice to have a fixed number of digits after the decimal point in the JSON, but that is a currently just a side effect of the C implementation rather than a defect in flux RFC18.

chu11 commented 2 years ago

Thought I'd post an offline chat I had with @morrone.

After thinking about it, my suspicion is that double may have been chosen partly (or largely) because that's the what the jansson library provides. And the reason my mind went to converting the eventlogs to use ISO 8601 is not because we couldn't represent nanoseconds correctly in a JSON number, but because jansson doesn't seem to provide an obvious mechanism to create / pack / unpack JSON "numbers". Everything is based on C ints, reals, and strings. So my mind went to "should we store timestamp as ISO 8601 string instead".

Looking at the documentation for json_pack() and json_unpack(), I'm not exactly sure to deal with this without changing how we store eventlogs. Maybe there's a way that isn't obvious to me at the moment. What would be great is if some type of "raw" input to pack/unpack was available. Like hypothetically (and hopefully I have the logic correct):

// assume t.seconds = 1234;, t.nanoseconds = 1;
sprintf (timebuf, "%d.%09d", t.seconds, t.nanoseconds);
json_pack ("{\"timestamp\":r}", timebuf);

Where r is some hypothetical "raw" input, i.e. put this string in there, but don't surround the string with quotes and it would result in {"timestamp":1234.000000001}

But that r packing key doesn't exist. So ... something to mull over.

Edit: I suppose "hand" creating JSON eventlog entries wouldn't be the end of the world if this becomes super critical. And this "hand" creating could be isolated to just libeventlog I think.

morrone commented 2 years ago

After our chat, I read the jansson docs a bit. It does certainly appear that jansson made the choice to limit JSON numbers in a way that the JSON spec itself does not (limiting them to C lang integers and doubles). It doesn't look like jansson makes that easy to change.

One of the more stream-oriented JSON parsers might make it easier to write out JSON compliant (but jansson unexpressible) numbers. But even then, apparently anyone using jansson might get overflow errors when reading the numbers. That might be avoidable if the jansson users increased the permitted prevision for floats. jansson might then silently discard the excess precision rather than throwing an overflow error.

Which is all to say that while the JSON spec allows arbitrary sized numbers, the proliferation of jansson may have made that a moot point in practice. So maybe moving to two integer fields (seconds and microsecond/nanoseconds), or a string field for the timestamp in flux may indeed be the only practical way to go.

garlick commented 2 years ago

If I understand this issue correctly, there are actually two: 1) flux encodes eventlog timestamps as a JSON number without controlling the number of digits that should communicate the timestamp's actual precision to a consumer, and 2) flux converts struct timespec timestamps with 1nsec resolution to C doubles which loses precision based on the amount of time elapsed since the epoch (a double would have about 10usec precision today I think?).

The point I was trying to make earlier is that these timestamps are not correlated with the events themselves with a high degree of accuracy. They are generate at single point, often the upon receiving a message from another part of the system that something has occurred. If we cared about high precision timestamps on these events, we would have put more effort/design into capturing them "upstream" (message wise). As noted by the spec, the thought at the time was that 1s accuracy was fine for answering the kinds of questions we anticipated.

What I'm unclear on is whether this issue is driven by a real need for high precision timestamps on certain things that we didn't anticipate and don't provide, or if it's about the false advertising in the json encoding, or something else?

morrone commented 2 years ago

Fundamentally, flux's job event log timestamps are currently variable precision, varying between 15 an 17 digits of precision (which in practice is currently 10us to 100ns).

I would like to see a fixed precision.

I don't think 1 second resolution would necessarily be reasonable. Job submissions and events can easily occur faster than that, and we would like to be able to use the timestamp to determine the order of events.

garlick commented 2 years ago

Since timestamps aren't guaranteed unique, using them to recreate the original event order is not reliable and could put events in an order that does not make sense. However, the aforementioned job manager journal does give you the events for all jobs in the order posted so adding a sequence number to that stream may help if that's the goal.

chu11 commented 2 years ago

Which is all to say that while the JSON spec allows arbitrary sized numbers, the proliferation of jansson may have made that a moot point in practice. So maybe moving to two integer fields (seconds and microsecond/nanoseconds), or a string field for the timestamp in flux may indeed be the only practical way to go.

An idea just came to me, although I dislike it. I'll just mention it.

Would it be the end of the world to update the RFC to add, lets call it a "high precision timestamp" to the eventlog? The field could be optional and maybe could be enabled via some config in flux. i.e.

{"timestamp":1234.1234, "timespec":{"seconds":1234, "nanoseconds":1234}, "name":"submit", "context":{........}}