JuliaIO / JLD2.jl

HDF5-compatible file format in pure Julia
Other
549 stars 85 forks source link

Support for single-file multithreading #403

Closed mbauman closed 8 months ago

mbauman commented 2 years ago

230 implemented thread-safety at a file level. However, if I try to read from a single file from multiple threads, I hit segfaults. I understand that multithreaded writing to a single file is fraught, but would it be a major challenge to support multithreaded reads?

JonasIsensee commented 2 years ago

Hi @mbauman,

this really should be possible. First, the reason that it doesn't work: JLD2 (globally) keeps track of all files that are being opened. This is to ensure that you e.g. don't accidentally open a single file in both read and write mode at the same time.

With the current logic it doesn't work since the same file handle is being used to process reads. And also the JLDFile object has plenty internal caching that could hit race-conditions.

ejmeitz commented 1 year ago

How hard would it be for a non-JLD expert to implement this? I have an application where everything threads perfectly but a HDF5 read is the thing preventing me from multi-threading the program. I can change my program to get around this but it would make my code a lot more hacky.

JonasIsensee commented 1 year ago

This wouldn't be very hard and does not require any real JLD2 knowledge.

This is the relevant function definition: https://github.com/JuliaIO/JLD2.jl/blob/e746c89202e6e831fc89363640edc5d4af66cb1c/src/JLD2.jl#L254 Currently, there is a "global" list that keeps track of all open files and ensures that only a single JLDFile can exist per file.

This condition could be relaxed:

JonasIsensee commented 8 months ago

implemented by #477