Closed ronytesler closed 2 years ago
Hi @ronytesler,
I think (also mentioned in the README file) shared memory and memory mapped file performance should be the same. For small files there is no need for the operating system to write changes back to disk. And unless some flags are set or you explicitly flush the content of the mapped area, the operating system won't write changes back AFAIK. Hence, there should be no overhead for using a memory mapped file. (Ok: mapping the file might be more costly, but we are not looking at initialization overhead in this benchmark).
Why are memory mapped files faster in our experiment: if I recall correctly: there is no warm up phase in the shared memory implementation, hence memory mapped files win in this benchmark. This should definitely be fixed, but for now we only have the note in the README ....
Hope this explanation helps :)
Best, Alex
"there is no warm up phase in the shared memory implementation, hence memory mapped files win in this benchmark" - You mean that because there is no warm up phase, the warm up is done after you start measuring the time and hence it's slower?
@alexandervanrenen And maybe you know about these:
Thanks!
"there is no warm up phase in the shared memory implementation, hence memory mapped files win in this benchmark" - You mean that because there is no warm up phase, the warm up is done after you start measuring the time and hence it's slower?
Yes, that would be my assumption. Because, as you have said .. it should be at least as fast. But someone would need to try it out to be certain ..
@alexandervanrenen And maybe you know about these:
- If I don't want the data to be backed in a real file, is there a reason to use mapped file over shared memory?
I think there would be some considerations to take into account. One thing that comes to mind is access control: with the memory-mapped file you can use linux file permissions (user/groups) to restrict access.
- Is there a size of data which is small enough so I can write it in one process while other processes read it, and I won't have a need to use a lock to prevent them from reading dirty data? Or should I always use a Mutex, Semaphore etc. when I have concurrent write and reads?
Technically you can read and write 8 bytes safely on intel x86. But it is highly recommended to still use atomic operations (cas) when accessing this shared data to prevent the compiler from assuming that the value is constant (the compiled code will still use regular read/writes).
Thanks!
nw :)
FYI: We did build a small library that is able to send/receive data over shared memory: L5. The algorithms are simple (to be efficient), but there was a detailed evaluation and talk at ICDE 2020.
@alexandervanrenen Thanks! I have a Java process and a Python process on Windows that need to pass data between them, about 50B each message. I guess I'll use shared memory / mapped files with Mutex or Semaphore. Thanks! : )
from https://www.boost.org/doc/libs/1_58_0/doc/html/interprocess/sharedmemorybetweenprocesses.html: "We can say that memory-mapped files offer the same interprocess communication services as shared memory with the addition of filesystem persistence. However, as the operating system has to synchronize the file contents with the memory contents, memory-mapped files are not as fast as shared memory."
So how is the performance of mapped files still better than shared memory?