emscripten-core / emscripten

Emscripten: An LLVM-to-WebAssembly Compiler
Other
25.76k stars 3.3k forks source link

mmap doesn't work as expected when mapping multiple pointers with MAP_SHARED to the same fd #21706

Open hly2019 opened 6 months ago

hly2019 commented 6 months ago

Please include the following in your bug report:

Version of emscripten/emsdk: emcc version: 3.1.54; node version: v21.7.0

Failing command line in full: emcc test_mmap.cpp -o main.js; node main.js

Full link command and output with -v appended:

Hi, I tried to use mmap with MAP_SHARED to map 2 pointers using the same file descriptor got by shm_open, expecting to share the memory between the pointers. However, when I tried to write something with one of these 2 pointer and read the other one, it seemed the result wasn't as expected. Here's a code example:

// test_mmap.cpp
#include <iostream>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main() {
    const char* name = "/shared_file";
    int len = 10 * sizeof(int);
    int offset = 0;
    int fd = shm_open(name, O_RDWR | O_CREAT, S_IRUSR | S_IWUSR); 
    if (fd == -1) {
        exit(0);
    }
    ftruncate(fd, off_t(len));
    int* ptr_1 = (int*) mmap(nullptr, len, (PROT_READ | PROT_WRITE), MAP_SHARED, fd, offset);
    int* ptr_2 = (int*) mmap(nullptr, len, (PROT_READ | PROT_WRITE), MAP_SHARED, fd, offset);

    ptr_2[0] = 10;

    std::cout << "ptr1[0] is: " << ptr_1[0] << ", ptr2[0] is: " << ptr_2[0] << std::endl;
    shm_unlink(name);
    return 0;
}

If I compiled it natively with gcc, i.e., g++ test_mmap.cpp -o main; ./main, then the result should be:

ptr1[0] is: 10, ptr2[0] is: 10

which is the expected result.

However, If I compile and run with emcc test_mmap.cpp -o main.js; node main.js, I get the result:

ptr1[0] is: 0, ptr2[0] is: 10

Meaning the sharing doesn't work as expected.

I found a related issue https://github.com/emscripten-core/emscripten/issues/5928, but I believe the case of issue #5928 and mine are different. In the issue https://github.com/emscripten-core/emscripten/issues/5928, it tried to use shm_open to get 2 file descriptors of one file and use mmap to map them respectively, and I also tried the code shown in issue https://github.com/emscripten-core/emscripten/issues/5928 and I found it's already fixed in emcc version 3.1.54. However, in my case, I tried to use mmap to map 2 pointers to one specific file descriptor, so it's quite a different issue, and I believe it is worth discussing.

Thank you very much!

sbc100 commented 5 months ago

From my reading of the man page for mmap I think you might be misunderstanding MAP_SHARED. That flag seems to be for creating mapping that are shared between processes. Two or more mapping of the same file should not require that flags.

Having said that, you can't expect to be able to map the same file twice in emscripten and see updated to one region appear in the other.. That would require access to the MMU which Wasm/emscripten does not have.

sbc100 commented 5 months ago

Honestly I would advise against all use of shm_open and mmap if you can in emscripten since we they support we have for these APIs is basically fake.

hly2019 commented 5 months ago

From my reading of the man page for mmap I think you might be misunderstanding MAP_SHARED. That flag seems to be for creating mapping that are shared between processes. Two or more mapping of the same file should not require that flags.

Having said that, you can't expect to be able to map the same file twice in emscripten and see updated to one region appear in the other.. That would require access to the MMU which Wasm/emscripten does not have.

Thank you very much for answering! I agree your point makes sense, it is more likely to be used in a multi-process scenario. But in our case, we also do find the situation that may use MAP_SHARED in one particular process. Also, I found #5928 raised a similar question (also seems under one process). I tried that one's example and it seems to have been solved in ver 3.1.54. As ours are similar, I wonder if it's possible to consider it as well. Anyway, thank you very much! Really appreciate your help!

hly2019 commented 5 months ago

Honestly I would advise against all use of shm_open and mmap if you can in emscripten since we they support we have for these APIs is basically fake.

Thanks so much for your advice!

sbc100 commented 5 months ago

From my reading of the man page for mmap I think you might be misunderstanding MAP_SHARED. That flag seems to be for creating mapping that are shared between processes. Two or more mapping of the same file should not require that flags. Having said that, you can't expect to be able to map the same file twice in emscripten and see updated to one region appear in the other.. That would require access to the MMU which Wasm/emscripten does not have.

Thank you very much for answering! I agree your point makes sense, it is more likely to be used in a multi-process scenario. But in our case, we also do find the situation that may use MAP_SHARED in one particular process. Also, I found #5928 raised a similar question (also seems under one process). I tried that one's example and it seems to have been solved in ver 3.1.54. As ours are similar, I wonder if it's possible to consider it as well. Anyway, thank you very much! Really appreciate your help!

While its true that multiple mappings of the same file will result in to seperate copies, I think the MAP_SHARED flag is not needed here and doesn't really have any bearing on the issue. You would get the same issue without MAP_SHARED. Unless I'm misunderstanding you don't need MAP_SHARED within a single process.

However the fact the multiple mapping don't work is just a fundamental limitation of emscripten. Perhaps we should assert when trying to map the same file twice, but that sounds like it might be tricky.

hly2019 commented 5 months ago

From my reading of the man page for mmap I think you might be misunderstanding MAP_SHARED. That flag seems to be for creating mapping that are shared between processes. Two or more mapping of the same file should not require that flags. Having said that, you can't expect to be able to map the same file twice in emscripten and see updated to one region appear in the other.. That would require access to the MMU which Wasm/emscripten does not have.

Thank you very much for answering! I agree your point makes sense, it is more likely to be used in a multi-process scenario. But in our case, we also do find the situation that may use MAP_SHARED in one particular process. Also, I found #5928 raised a similar question (also seems under one process). I tried that one's example and it seems to have been solved in ver 3.1.54. As ours are similar, I wonder if it's possible to consider it as well. Anyway, thank you very much! Really appreciate your help!

While its true that multiple mappings of the same file will result in to seperate copies, I think the MAP_SHARED flag is not needed here and doesn't really have any bearing on the issue. You would get the same issue without MAP_SHARED. Unless I'm misunderstanding you don't need MAP_SHARED within a single process.

However the fact the multiple mapping don't work is just a fundamental limitation of emscripten. Perhaps we should assert when trying to map the same file twice, but that sounds like it might be tricky.

Thank you very much! I think I get your point.

Here I just wanna further clarify the case of using MAP_SHARED in a single process: it is true that using other mappings like MAP_PRIVATE results in not sharing the file content (for in the example given above, using MAP_PRIVATE also gets ptr1[0] is: 0, ptr2[0] is: 10), but this behavior will be the same if we compile and run using gcc and thus is as expected. And what we expect is, in brief, just making the 2 pointers share the content. For example, if we write something in ptr_1, we expect to get the same thing from ptr_2 in the same process. This works in C/C++ with MAP_SHARED as shown above, although it might not be a common use case.

Thank you!

sbc100 commented 5 months ago

Sorry you care correct, I didn't read the MAP_SHARED man page correctly. You do indeed need MAP_SHARED to get a double mapping of the same file, but there is no way to make the work in emscripten.