Closed dennisklein closed 1 year ago
Actually, I suspect the issue comes from
https://github.com/boostorg/interprocess/commit/140b50efb3281fa3898f3a4cf939cfbda174718f
which got introduced in 1.76. Maybe you could adapt the title.
Did you upgrade Boost from 1.75?
yes
Ciao, Giulio
On Fri, Nov 17 2023 at 12:08 PM, Dennis Klein @.***> wrote:
Did you upgrade Boost from 1.75?
— Reply to this email directly, view it on GitHub https://github.com/FairRootGroup/FairMQ/issues/491#issuecomment-1816173818, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAACSMFNAUK52LB3A2PU6I3YE5ATHAVCNFSM6AAAAAA7PSXUISVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJWGE3TGOBRHA . You are receiving this because you commented.Message ID: @.***>
My attempt with making the session Id unique did not produce the expected results. It still fails in:
https://ali-ci.cern.ch/alice-build-logs/alisw/alidist/5242/f41bcbf4c32a1081142763d2fc5ddb3d805a6453/build_O2_alidist-dataflow-cs8/pretty.html
I wonder if something changed wrt containers.
Some additional insight from https://github.com/alisw/alidist/pull/5242:
Reproducer for the issue:
#include <fairmq/Tools.h> #include <fairmq/ProgOptions.h> #include <fairmq/TransportFactory.h> int main(int argc, char** argv) { size_t session{(size_t)getpid() * 1000 + 0}; fair::mq::ProgOptions config; config.SetProperty<std::string>("session", std::to_string(session)); auto factoryZMQ = fair::mq::TransportFactory::CreateTransportFactory("zeromq"); auto factorySHM = fair::mq::TransportFactory::CreateTransportFactory("shmem"); }
compile e.g. with g++ -o reproducer -lfairmq -lboost_program_options -O3 -std=c++20 reproducer.cpp
Another code snippet useful in debugging:
#include <boost/interprocess/managed_shared_memory.hpp>
#include <string>
int main() {
std::string name = "blubb";
// boost::interprocess::shared_memory_object::remove(name.c_str());
boost::interprocess::managed_shared_memory mngSegment(boost::interprocess::open_or_create, name.c_str(), 4096);
// boost::interprocess::shared_memory_object::remove(name.c_str());
return 0;
}
Compile e.g. with g++ -o bipc -Og -g -std=c++20 bipc.cpp
.
Another more direct reproducer (shm_open
args extracted from strace bipc
):
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
int fd;
fd = shm_open(argv[1], O_RDWR|O_CREAT|O_EXCL|O_NOFOLLOW|O_CLOEXEC, 0644);
if (fd == -1) {
perror("shm_open");
return EXIT_FAILURE;
}
int rc = posix_fallocate(fd, 0, 4194304 /* 4 MiB */);
if (rc && rc != EOPNOTSUPP) {
perror("posix_fallocate");
return EXIT_FAILURE;
}
if (ftruncate(fd, 4194304 /* 4 MiB */) == -1) {
perror("ftruncate");
return EXIT_FAILURE;
}
return EXIT_SUCCESS;
}
Compile with gcc -o shm_open -Og -g shm_open.c
and run via ./shm_open blubb
or ./shm_open /blubb
.
With the reproducers I observe:
/dev/shm
tmpfs mounted they create the shm file as unprivileged user
/dev/shm
tmpfs mounted (after sudo umount /dev/shm
) I see permission denied
, as root
the programs succeed
/dev
is of type devtmpfs
and is writable for root and appears to accept shm files/dev/shm
directory (after sudo rmdir /dev/shm
) I see No such file or directory
blubb
or /blubb
to shm_open(3)
seems to behave the same on linux (Fedora 38 (6.5.11-200.fc38.x86_64), glibc 2.37)note: remount via e.g. sudo mkdir /dev/shm
and sudo mount -t tmpfs -o size=20g tmpfs /dev/shm
File exists
(?) when running a CI builds/test suites in parallel containerized environments, but then here that the file is not existing!?
boost::interprocess::interprocess_exception::what()
)File exists
-> Sounds like the parallel containers share their /dev/shm
tmpfs mountpoint!?
/dev/shm
directory and/or /dev/shm
tmpfs mounted (with which permissions regarding the user that runs the failing programs)I now fully understand the issue:
ftruncate
to posix_fallocate
. The latter having the implication that enough memory must be available (which is not, for our tests). This is a bit of a bummer, because, while we can adjust the tests, AFAICT it basically means we cannot over provision shared memory buffers for their peak usage. I am also worried about the accounting implications (e.g. on the Grid).if (ftruncate(fd, 4194304 / 4 MiB /) == -1) {
For the record, this is posix_fallocate
since boost 1.76.0.
AFAICS, Boost.Interprocess does not provide any API to configure this, so, I guess, there is nothing to be done for now on fmq side. Closing therefor, let's revisit once you have a decision on how to proceed.
Agreed, thank you for your time.