JuliaIO / HDF5.jl

Save and load data in the HDF5 file format from Julia
https://juliaio.github.io/HDF5.jl
MIT License
386 stars 140 forks source link

add lock around all API calls #1021

Closed simonbyrne closed 1 year ago

simonbyrne commented 1 year ago

As a potential fix to #835 and #990: this adds a simple lock around all API calls: I was initially worried this would cause a performance bottleneck, but some simple examples didn't show any real difference.

mkitti commented 1 year ago

Would it make sense to provide this in two layers

  1. A raw API module without any locks
  2. An Thread-safe API module with locks like you've done here.

Basically we run the generator twice and produce two separate modules.

For the higher level API we may want to engage the locks for a series of low-level API calls rather than locking and unlocking for each API call.

mkitti commented 1 year ago

We could even use a Preferences.jl mechanism to enable locks everywhere or not by default.

simonbyrne commented 1 year ago

That's what I was originally thinking, but it seems like the overhead is small enough that it might not be worth it?

simonbyrne commented 1 year ago

I've tried some more simple test cases (e.g. reading/writing single values in a loop) and I don't see any observable overhead from the lock.

mkitti commented 1 year ago

@simonbyrne What's your reading of https://github.com/JuliaLang/julia/issues/35689 ? It sounds to me like there may be an issue if finalizers acquire locks at the moment, with that issue still open.

simonbyrne commented 1 year ago

@simonbyrne What's your reading of JuliaLang/julia#35689 ? It sounds to me like there may be an issue if finalizers acquire locks at the moment, with that issue still open.

I think that is addressed by https://github.com/JuliaLang/julia/pull/38487, which prevents finalizers from running on the current thread until the lock is released.

mkitti commented 1 year ago

That's strange that a pull request with a lower number resolves an issue with a higher number.

musm commented 1 year ago

@mkitti I haven't fully investigated the Preferences.jl solution, but this seems a lot simpler and since there's no tradeoff with these calls, I don't see a strong reason to make it configurable. Are there any use cases for users to want to turn it off other than devs wanting to experiment?

mkitti commented 1 year ago

The statement that "there is no tradeoff" still needs to be clearly established. Anecdotal testing does suggest that the effect is small, but we do not have a way of comprehensively establishing this for all applications. Thus, we need to provide a way for the user to do the experiment with their specific applications.

The result "there is no tradeoff" does seem surprising. A single HDF5.jl function usually makes several C API calls. Locking and unlocking several times within a high level function call does seem inefficient. Is there really no cost to this at all? If there were no cost, then why not enable thread safety within the C API by default?

mkitti commented 1 year ago

With lock:

Test Summary:                      | Pass  Broken  Total     Time
HDF5.jl                            | 1085       5   1090  1m55.2s
  plain                            |  151       1    152    23.1s
  complex                          |   13             13     0.8s
  undefined and null               |    4              4     0.0s
  abstract arrays                  |    2              2     0.6s
  empty and 0-size arrays          |   39             39     0.6s
  generic read of native types     |   17             17     0.6s
  show                             |   51             51     2.8s
  split1                           |   13             13     0.3s
  haskey                           |   18             18     0.5s
  AbstractString                   |   51             51     4.3s
  opaque data                      |    7              7     1.2s
  FixedStrings and FixedArrays     |   18             18     1.0s
  Object Exists                    |    6              6     0.0s
  HDF5 existance                   |    4              4     0.0s
  bounds                           |    2              2     0.2s
  h5a_iterate                      |    8              8     0.2s
  h5l_iterate                      |    8              8     0.2s
  compound                         |   10             10     3.6s
  custom                           |    6              6     0.7s
  reference                        |    6              6     0.2s
  Dataspaces                       |   91             91     0.4s
  Datatypes                        |   15             15     0.1s
  BlockRange                       |   35             35     0.3s
  hyperslab                        |    5              5     1.2s
  read 0-length arrays: issue #859 |                None     0.2s
  attrs interface                  |   92             92     1.3s
  readremote                       |   23             23     1.6s
  extend                           |   25             25     1.7s
  gc                               |  101            101     3.8s
  external                         |    6              6     0.1s
  swmr                             |    4              4     8.2s
  mmap                             |    9              9     1.0s
  properties                       |   42       2     44     2.0s
  filter                           |   48       1     49    14.0s
  Raw Chunk I/O                    |   52             52     4.6s
  fileio                           |    6              6     1.8s
  track order                      |   18             18     2.4s
  h5f_get_dset_no_attrs_hint       |    6              6     0.4s
  non-allocating methods           |   11       1     12     1.2s
  Compression Filter Unit Tests    |    6              6     0.2s
  Object API                       |   38             38     0.8s
  virtual dataset                  |    5              5     0.9s
  API Lock Preference              |    5              5     0.7s
     Testing HDF5 tests passed 

Without lock:

Test Summary:                      | Pass  Broken  Total     Time
HDF5.jl                            | 1085       5   1090  1m40.0s
  plain                            |  151       1    152    21.9s
  complex                          |   13             13     0.8s
  undefined and null               |    4              4     0.0s
  abstract arrays                  |    2              2     0.6s
  empty and 0-size arrays          |   39             39     0.6s
  generic read of native types     |   17             17     0.6s
  show                             |   51             51     2.8s
  split1                           |   13             13     0.3s
  haskey                           |   18             18     0.4s
  AbstractString                   |   51             51     4.0s
  opaque data                      |    7              7     1.2s
  FixedStrings and FixedArrays     |   18             18     0.9s
  Object Exists                    |    6              6     0.0s
  HDF5 existance                   |    4              4     0.0s
  bounds                           |    2              2     0.2s
  h5a_iterate                      |    8              8     0.2s
  h5l_iterate                      |    8              8     0.2s
  compound                         |   10             10     3.5s
  custom                           |    6              6     0.7s
  reference                        |    6              6     0.2s
  Dataspaces                       |   91             91     0.4s
  Datatypes                        |   15             15     0.0s
  BlockRange                       |   35             35     0.3s
  hyperslab                        |    5              5     1.2s
  read 0-length arrays: issue #859 |                None     0.2s
  attrs interface                  |   92             92     1.3s
  readremote                       |   23             23     1.6s
  extend                           |   25             25     1.7s
  gc                               |  101            101     3.5s
  external                         |    6              6     0.0s
  swmr                             |    4              4     7.7s
  mmap                             |    9              9     1.0s
  properties                       |   42       2     44     1.7s
  filter                           |   48       1     49    13.7s
  Raw Chunk I/O                    |   52             52     4.4s
  fileio                           |    6              6     1.7s
  track order                      |   18             18     2.4s
  h5f_get_dset_no_attrs_hint       |    6              6     0.3s
  non-allocating methods           |   11       1     12     1.2s
  Compression Filter Unit Tests    |    6              6     0.2s
  Object API                       |   38             38     0.7s
  virtual dataset                  |    5              5     0.9s
  API Lock Preference              |    5              5     0.6s
     Testing HDF5 tests passed 

That suggests the tests with the lock in place could take 15% longer than without the lock in place. I think that's at the high end. Over a number of tests it looks like 5% with a range of 2% to 16%.

mkitti commented 1 year ago

Should we bump to 0.17 if we merge this? It does not look directly breaking to me, but it is a significant change that it could be.

musm commented 1 year ago

I think we can probably keep things at 0.16, this PR should be a strict improvement (sans minor speed regression)