ami-iit / matio-cpp

A C++ wrapper of the matio library, with memory ownership handling, to read and write .mat files.
https://ami-iit.github.io/matio-cpp/
BSD 2-Clause "Simplified" License
59 stars 9 forks source link

Thread Safety #72

Open pascalreinhold opened 1 year ago

pascalreinhold commented 1 year ago

Hello there,

is it possible to read multiple large structs (1-3 GB) from a MAT-file concurrently? I found nothing on this github page regarding thread safety.

If it is not supported out of the box then how would one go about it?

S-Dafarra commented 1 year ago

Hi @pascalreinhold, thanks for opening the issue!

matio-cpp is a cpp interface toward the matio library, that takes care of dealing with the mat file. When opening a mat file, matio loads its entire content in memory. Hence, when reading and writing variables, it always accesses the same portion of memory. Thus, there are possible concurrency issues, and by extension, also matio-cpp is not thread-safe.

If your goal is to speed up the reading of the file, I would suggest splitting it in separate files, or to use a format that supports reading in chunks like hdf5 (see for example https://docs.hdfgroup.org/hdf5/v1_12/group___h5_d.html#gac1092a63b718ec949d6539590a914b60). Recent mat files are compatible with hdf5, but mat files on their own do not support this option unfortunately.

pascalreinhold commented 1 year ago

Hey thank you for the fast reply.

Does this is also apply to me, because I'm just reading the file and not writing?

Hence, when reading and writing variables [...] there are possible concurrency issues, and by extension, also matio-cpp is not thread-safe.

Not sure, but I think you are mistaken. In matio there are the Mat_VarReadInfo() and Mat_VarRead() functions to avoid loading a variable into memory until you need it.

When opening a 'mat' file, 'matio' loads its entire content in memory

S-Dafarra commented 1 year ago

Not sure, but I think you are mistaken. In matio there are the Mat_VarReadInfo() and Mat_VarRead() functions to avoid loading a variable into memory until you need it.

When opening a 'mat' file, 'matio' loads its entire content in memory

Both those function require opening the mat file first, i.e. loading it into memory. See:

Btw, Mat_VarRead is the exact function that matio-cpp uses to read a variable: https://github.com/ami-iit/matio-cpp/blob/a0daf0691d492b2ed50910ea984f97bc2f945b80/src/File.cpp#L322

Note that Mat_VarRead requires a non-const pointer to a mat_t object. This means that even the read can potentially modify this object. Hence, there could be possible concurrent reads and writes. So to answer your question,

Does this is also apply to me, because I'm just reading the file and not writing?

unfortunately, yes.