Open KrisThielemans opened 4 years ago
I have seen this error message in Travis logs many times, but the reported error never ever happened locally, so impossible to investigate, I am afraid.
I get this one, locally, from time to time. However, I cannot reproduce it.
hmmm. this is going to be tough then. Any ideas for writing some debugging checks and doing a special test-run with 1000 tests and see when it fails?
@evgueni-ovtchinnikov I haven't looked through the source code, but if this pertains to file writing, could you put it in a for loop? Similar to what you already do for trying to connect to the gadgetron server)?
bool success = false;
unsigned num_attempts = 5;
for (unsigned i=0; i<num_attempts; ++i) {
try {
success = do_the_thing_that_causes_the_error();
}
catch {}
if (success) break;
}
if (!success)
throw std::runtime_error("bad file descriptor");
@johannesmayer: if you get this error when running your mrtest.cpp
, then one possible culprit is your MRAcquisitionData::read
, where you create ISMRMRD::Dataset
and call its methods readHeader
, getNumberOfAcquisitions
and readAcquisition
without Mutex
locking/unlocking.
I have very little idea what Mutex
does - something to do with multithreading - but I noticed Gadgetron was using it, so I just followed suit, see e.g. AcquisitionsFile::get_acquisition
.
@rijobro: what you suggest looks like papering over the crack, I am afraid. I would try to investigate a bit more before resorting to your fallback.
added missing mutex locks/unlocks, HTH
I have very little idea what Mutex does - something to do with multithreading - but I noticed Gadgetron was using it, so I just followed suit, see e.g. AcquisitionsFile::get_acquisition.
Mutex is used to stop multiple threads accessing the same files/variables simultaneously, leading to data races, etc.
So it could well be that missing mutex's solve the problem. Thanks.
Bug still persisting (PR from today): https://travis-ci.org/github/SyneRBI/SIRF/jobs/703951360#L28836
This job https://travis-ci.org/github/SyneRBI/SIRF-SuperBuild/jobs/679959452#L16334 from https://github.com/SyneRBI/SIRF-SuperBuild/pull/377 (which is a
DEVEL
build) fails, while others are fine. The error is in the MR testI'll rerun the job, as I guess this won't happen again, but it is worrying nevertheless.
@evgueni-ovtchinnikov any ideas?