Closed simonbyrne closed 1 year ago
Would it make sense to provide this in two layers
Basically we run the generator twice and produce two separate modules.
For the higher level API we may want to engage the locks for a series of low-level API calls rather than locking and unlocking for each API call.
We could even use a Preferences.jl mechanism to enable locks everywhere or not by default.
That's what I was originally thinking, but it seems like the overhead is small enough that it might not be worth it?
I've tried some more simple test cases (e.g. reading/writing single values in a loop) and I don't see any observable overhead from the lock.
@simonbyrne What's your reading of https://github.com/JuliaLang/julia/issues/35689 ? It sounds to me like there may be an issue if finalizers acquire locks at the moment, with that issue still open.
@simonbyrne What's your reading of JuliaLang/julia#35689 ? It sounds to me like there may be an issue if finalizers acquire locks at the moment, with that issue still open.
I think that is addressed by https://github.com/JuliaLang/julia/pull/38487, which prevents finalizers from running on the current thread until the lock is released.
That's strange that a pull request with a lower number resolves an issue with a higher number.
@mkitti I haven't fully investigated the Preferences.jl solution, but this seems a lot simpler and since there's no tradeoff with these calls, I don't see a strong reason to make it configurable. Are there any use cases for users to want to turn it off other than devs wanting to experiment?
The statement that "there is no tradeoff" still needs to be clearly established. Anecdotal testing does suggest that the effect is small, but we do not have a way of comprehensively establishing this for all applications. Thus, we need to provide a way for the user to do the experiment with their specific applications.
The result "there is no tradeoff" does seem surprising. A single HDF5.jl function usually makes several C API calls. Locking and unlocking several times within a high level function call does seem inefficient. Is there really no cost to this at all? If there were no cost, then why not enable thread safety within the C API by default?
With lock:
Test Summary: | Pass Broken Total Time
HDF5.jl | 1085 5 1090 1m55.2s
plain | 151 1 152 23.1s
complex | 13 13 0.8s
undefined and null | 4 4 0.0s
abstract arrays | 2 2 0.6s
empty and 0-size arrays | 39 39 0.6s
generic read of native types | 17 17 0.6s
show | 51 51 2.8s
split1 | 13 13 0.3s
haskey | 18 18 0.5s
AbstractString | 51 51 4.3s
opaque data | 7 7 1.2s
FixedStrings and FixedArrays | 18 18 1.0s
Object Exists | 6 6 0.0s
HDF5 existance | 4 4 0.0s
bounds | 2 2 0.2s
h5a_iterate | 8 8 0.2s
h5l_iterate | 8 8 0.2s
compound | 10 10 3.6s
custom | 6 6 0.7s
reference | 6 6 0.2s
Dataspaces | 91 91 0.4s
Datatypes | 15 15 0.1s
BlockRange | 35 35 0.3s
hyperslab | 5 5 1.2s
read 0-length arrays: issue #859 | None 0.2s
attrs interface | 92 92 1.3s
readremote | 23 23 1.6s
extend | 25 25 1.7s
gc | 101 101 3.8s
external | 6 6 0.1s
swmr | 4 4 8.2s
mmap | 9 9 1.0s
properties | 42 2 44 2.0s
filter | 48 1 49 14.0s
Raw Chunk I/O | 52 52 4.6s
fileio | 6 6 1.8s
track order | 18 18 2.4s
h5f_get_dset_no_attrs_hint | 6 6 0.4s
non-allocating methods | 11 1 12 1.2s
Compression Filter Unit Tests | 6 6 0.2s
Object API | 38 38 0.8s
virtual dataset | 5 5 0.9s
API Lock Preference | 5 5 0.7s
Testing HDF5 tests passed
Without lock:
Test Summary: | Pass Broken Total Time
HDF5.jl | 1085 5 1090 1m40.0s
plain | 151 1 152 21.9s
complex | 13 13 0.8s
undefined and null | 4 4 0.0s
abstract arrays | 2 2 0.6s
empty and 0-size arrays | 39 39 0.6s
generic read of native types | 17 17 0.6s
show | 51 51 2.8s
split1 | 13 13 0.3s
haskey | 18 18 0.4s
AbstractString | 51 51 4.0s
opaque data | 7 7 1.2s
FixedStrings and FixedArrays | 18 18 0.9s
Object Exists | 6 6 0.0s
HDF5 existance | 4 4 0.0s
bounds | 2 2 0.2s
h5a_iterate | 8 8 0.2s
h5l_iterate | 8 8 0.2s
compound | 10 10 3.5s
custom | 6 6 0.7s
reference | 6 6 0.2s
Dataspaces | 91 91 0.4s
Datatypes | 15 15 0.0s
BlockRange | 35 35 0.3s
hyperslab | 5 5 1.2s
read 0-length arrays: issue #859 | None 0.2s
attrs interface | 92 92 1.3s
readremote | 23 23 1.6s
extend | 25 25 1.7s
gc | 101 101 3.5s
external | 6 6 0.0s
swmr | 4 4 7.7s
mmap | 9 9 1.0s
properties | 42 2 44 1.7s
filter | 48 1 49 13.7s
Raw Chunk I/O | 52 52 4.4s
fileio | 6 6 1.7s
track order | 18 18 2.4s
h5f_get_dset_no_attrs_hint | 6 6 0.3s
non-allocating methods | 11 1 12 1.2s
Compression Filter Unit Tests | 6 6 0.2s
Object API | 38 38 0.7s
virtual dataset | 5 5 0.9s
API Lock Preference | 5 5 0.6s
Testing HDF5 tests passed
That suggests the tests with the lock in place could take 15% longer than without the lock in place. I think that's at the high end. Over a number of tests it looks like 5% with a range of 2% to 16%.
Should we bump to 0.17 if we merge this? It does not look directly breaking to me, but it is a significant change that it could be.
I think we can probably keep things at 0.16, this PR should be a strict improvement (sans minor speed regression)
As a potential fix to #835 and #990: this adds a simple lock around all API calls: I was initially worried this would cause a performance bottleneck, but some simple examples didn't show any real difference.