ancapdev / LightBSON.jl

High performance encoding and decoding of BSON data in Julia
MIT License
20 stars 4 forks source link

Remove dependency on specific `Libc` features #17

Closed Seelengrab closed 2 years ago

Seelengrab commented 2 years ago

trunc(UInt32, time()) does the same exact thing, since the extra nanosecond precision doesn't matter if we're only interested in seconds anyway.

I'm not familiar with BSON or it's spec, so I didn't want to change this to UInt64. May be advisable though, to prevent Y2106 style problems.

Ref https://github.com/JuliaLang/julia/pull/45023

ancapdev commented 2 years ago

I see, so this is being deprecated.

I initially did the Linux implementation this way because it's faster than going through the Julia stdlib functions, same as I did in UnixTimes https://github.com/ancapdev/UnixTimes.jl/blob/master/src/UnixTimes.jl#L91-- just didn't want to pull that dependency in here.

Think I'd prefer to keep the faster Linux path, and fall back on time() on other platforms.

Seelengrab commented 2 years ago

They're literally using the same function calls under the hood though (time() calls jl_clock_now which just creates a TimeVal and stitches the values into a Float64) and FFI alone will create a higher latency than the resolution provided by timeval (or timespec for that matter). The deprecation is just due to Unix' gettimeofday being deprecated itself and replaced with the higher accuracy clock_gettime.

On top of this, the way it's written right now also falls back to the "regular" julia version on macOS, which is Unix and should handle clock_gettime just as well as linux.

How performance critical is this function? I thought it wasn't critical, since only seconds are returned anyway. If you see that much latency on this call, something else is going very wrong.

Seelengrab commented 2 years ago

I was actually about to open a PR on UnixTimes as well, since that seems to be the only registered package that's accessing TimeVal.usec directly :sweat_smile:

ancapdev commented 2 years ago

How performance critical is this function? I thought it wasn't critical, since only seconds are returned anyway. If you see that much latency on this call, something else is going very wrong.

It's not that the latency is critical, it's the throughput, in case you want to generate a large number of IDs.

Regard UnixTime and timestamping in general, I was always surprised Julia used the older lower resolution API rather than clock_gettime, so glad to see that changing. I'd still keep the native ccall path directly to clock_gettime in that package for performance reasons.

Seelengrab commented 2 years ago

Ah, you're talking about UnixTimes.jl? Yeah, I can see that being desirable there. Shouldn't make a difference for this package though, since the resolution returned here is in seconds anyway, right? How is seconds_from_epoch_ used in this package? If it's for ID generation, that seems kind of bad - naively that would lead to a lot of collisions, wouldn't it?

ancapdev commented 2 years ago

Ah, you're talking about UnixTimes.jl? Yeah, I can see that being desirable there. Shouldn't make a difference for this package though, since the resolution returned here is in seconds anyway, right? How is seconds_from_epoch_ used in this package? If it's for ID generation, that seems kind of bad - naively that would lead to a lot of collisions, wouldn't it?

It's based on the BSON spec. There's a part that is the current second since epoch, a random part, and an incrementing part. See here: https://github.com/ancapdev/LightBSON.jl/blob/master/src/object_id.jl#L89. So it's not that the time part changes with every ID you generate, but if you want to know if it's changed you need to query the current time, so it helps if this call is fast.

I don't have actually have a performance sensitive use case for this myself, but since the code is already in there I think it's sufficient to change the fallback to your suggestion. Going via floating point representation hits my OCD in a bad way though :joy:

Seelengrab commented 2 years ago

Well as far as I can see, the version you have right now is fast because you ignore the return value of that ccall :) Checking that adds ~5ns on my machine vs not checking that.

some benchmarks ``` julia> @benchmark seconds_from_epoch_() BenchmarkTools.Trial: 10000 samples with 996 evaluations. Range (min … max): 18.004 ns … 38.492 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 18.316 ns ┊ GC (median): 0.00% Time (mean ± σ): 18.936 ns ± 1.624 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▅▄ ▂ ▂ ▂ ▁▁ ▄▂▁ ▂ ▁ ▃▃▅███▁▁▁█▄▇▁▃▃▁█▄▄▁▁▃▁█▇█▃▃▅▄▇▇▆▄▃▁▁▃▇██▁▁▁▁▁▁████▁▁▁▁▃▆█▄ █ 18 ns Histogram: log(frequency) by time 23.2 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark Libc.TimeSpec().sec # this checks the return value of `clock_gettime`, also uses `CLOCK_REALTIME` BenchmarkTools.Trial: 10000 samples with 995 evaluations. Range (min … max): 23.590 ns … 52.797 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 23.620 ns ┊ GC (median): 0.00% Time (mean ± σ): 24.548 ns ± 2.760 ns ┊ GC (mean ± σ): 0.00% ± 0.00% █▂▁ ▁ ▁ ▂ ▄ ▂ ▁ ███▅▃█▇▆█▅▄▁█▆▃▇▃▁▄█▆▃▁█▅▃▁▃▄▃▃▃▃▃▁▁▃▁▁▁▁▁▁▁▁▃▁▅▅▅▅▆▅▅▅▄▆▆▅ █ 23.6 ns Histogram: log(frequency) by time 39.1 ns < Memory estimate: 0 bytes, allocs estimate: 0. julia> @benchmark trunc(UInt32, time()) # this has the check, packs it into a float and then retrieves the seconds BenchmarkTools.Trial: 10000 samples with 991 evaluations. Range (min … max): 31.564 ns … 96.475 ns ┊ GC (min … max): 0.00% … 0.00% Time (median): 32.440 ns ┊ GC (median): 0.00% Time (mean ± σ): 33.456 ns ± 3.669 ns ┊ GC (mean ± σ): 0.00% ± 0.00% ▆▄█ ▂ ▁ ▁▁▃▂ ▁ ███▇▅██▆█▆▄██▆█▇▆▄█████▇▆██▃▄▁▁▃▃▃▁▁▁▆▆▆▇▆▆▆▆▆▆▆▇▆▄▄▃▄▅▅▆▅▆ █ 31.6 ns Histogram: log(frequency) by time 52.6 ns < Memory estimate: 0 bytes, allocs estimate: 0. ```

If you can manage to do useful work in ~5-13ns (isn't that already less than the overhead of a missed branch or the generation of the random component?), I'd be very impressed :)

Either way, the islinux branch isn't that important to me since it doesn't touch Libc anyway, only the else branch and that only by coincidence. I just thought since both are using the same thing in the end, unifying them may be nice. Maybe it'd be better to do a PR in the future, once the TimeSpec PR is merged to then change both branches to return Libc.TimeSpec().nsec % UInt32? I'm kind of reluctant to suggest that, since it's kind of unclear whether that is supported API or not, but that's kind of hard to know anyway :sweat:

ancapdev commented 2 years ago

If you can manage to do useful work in ~5-13ns (isn't that already less than the overhead of a missed branch or the generation of the random component?), I'd be very impressed :)

In some contexts it does matter. My use case is high frequency trading systems. Not so much for this package, but for UnixTimes. A correctly predicted branch (as error checking would be for this call) is substantially less overhead than 5ns btw, it's usually speculated early in the pipeline and depending on what else is executing around it would have an a inverse throughput sub 1 cycle.

This call to clock_gettime is also using CLOCK_REALTIME btw, its value is 0 in glibc.

Maybe it'd be better to do a PR in the future, once the TimeSpec.

Could you make the non-linux branch conditional on the Julia version and switch between TimeVal and TimeSpec?

Seelengrab commented 2 years ago

In some contexts it does matter. My use case is high frequency trading systems. Not so much for this package, but for UnixTimes.

Do keep in mind that this will (and does right now) include error checking either way, so at the moment on non-linux systems you are eating that error checking penalty.

Maybe it'd be better to do a PR in the future, once the TimeSpec.

Could you make the non-linux branch conditional on the Julia version and switch between TimeVal and TimeSpec?

Yes, that's what I had in mind. If that's fine with you, I'll close this PR and do that both here & in UnixTimes when the PR to main julia hits.

ancapdev commented 2 years ago

Yes, that's what I had in mind. If that's fine with you, I'll close this PR and do that both here & in UnixTimes when the PR to main julia hits.

Sounds good to me, and thanks for proactively looking at dependents!