Closed isdneuroimaging closed 9 years ago
OK, this is a new one... Bus errors are really difficult to get a proper handle on, I've never quite figured out exactly what they refer to. But the only other context I've ever had such issues was when trying to implement incremental reading of a track file while the file was being actively written to. At that time, I'd convinced myself that the bus error actually referred to attempts to read parts of a file that should theoretically be readable, but just happen not to have been initialised by the kernel yet - i.e. trying to read right after the end-of-file pointer is updated, but before the memory backing the new file segment is available.
But I can't see how this would relate to what you're getting - especially since we've never seen this kind of problem in any recent version of the code (by recent, I mean within the last ~8 years). My first question would relate to the filesystem that you're writing to - maybe you're using a network-based filesystem of something, and there's a funny bug in it? The other question is whether this is reproducible? It looks like it should be, given that it fails right at the first track. Maybe you can try adding the -nthread 0
option to disable multi-threading? Maybe you can try writing to an output in a different location? If any of these change the outcome, that would be useful information...
Another variable to experiment with is using a standard mask-image-based seeding rather than dynamic seeding, just in case that's doing something weird.
Good point, that's actually not unlikely to be the issue. The dynamic seeding relies on updating the TDI (TOD?) map in real time while all the other threads are reading from it, there could be some subtle race conditions there. And if the TDI/TOD map is stored using some kind of memory-mapped file (is it?), that could explain the bus error. In fact, it seems strongly related to a recent email exchange we had about using the spare format in multi-threaded applications... Could it have something to do with that...?
Fixel TDI. It's not memory-mapped, all internal in RAM, nothing to do with the sparse format.
Track density in each fixel is a std::atomic
, and I use a std::atomic_flag
to prevent concurrent updates, so shouldn't be a problem. But Murphy hasn't been particularly kind to me lately, so it's worth testing.
Great suggestions!
I can confirm that this occurs only with the dynamic seeding. Mask-image-based seeding works, at least with 10M tracks, with 100M it stalls at 0 for some, probably unrelated reason. But because of this observation, I also tried dynamic seeding with fewer tracks (1000), again with bus error.
Single-threaded (-nthread 0
): Uh, resampling and segmenting was really slow ;) Interestingly: Different error:
tckgen: [100%] resampling ACT 5TT image to fixel image space...
tckgen: [100%] Segmenting FODs...
tckgen: [100%] Segmenting FODs...
tckgen: [ 0%] 1 generated, 1 selectedSegmentation fault: 11
Filesystem: This was on the local SSD. I tried again on another local hard drive: Bus error.
Let me know if I can test something else! Happy to help.
Yuck... looks like this is going to be a nasty one to hunt down. Once we're done we should also look into why it's stalling when requesting 100M tracks: it may be related, it might not, but it's still not something that should be causing you problems.
Next step is to run the command in debug mode. You will need to configure and compile the code in debug mode (this won't affect your existing binaries):
./configure -debug debug
./build debug
You will also need to install GDB through whichever avenue is appropriate. Then, re-run the command, but replace:
tckgen
with:
gdb --args tckgen__debug
.
Probably best to run it in single-threaded mode too: it will make things easier to navigate once we get to the crash point, though it will take a long time to get there. When GDB starts, hit 'r' to start the program running. Once it crashes, type 'bt' (backtrace), and post the output here. If possible, leave the terminal in that state after the fact, just in case there are additional things we can query using GDB to hunt down the problem.
Can you also let us know your compiler version: I know 4.7 has incomplete support of C++11, there's a remote chance the use of atomic functions in the dynamic seeder is causing problems.
This is on MacOSX, so compiler would be clang, with complete C++11 support - that shouldn't be the problem. gdb
is no longer available on MacOSX, use lldb
instead. Other than that, the instructions above should hopefully work...
Sorry guys, never worked with lldb before and it looks complicated from what I found on the web ;) Could you please help me how to run tckgen__debug
with lldb? Substituting gdb
by lldb
with the arguments provided above does not work.
Never used it myself either. But looking at the documentation, I think it's just a matter of changing:
gdb --args tckgen__debug ...
with:
lldb -- tckgen__debug ...
OK, I omitted the -act option and used a lower resolution dataset, because it took ages to calculate within lldb. Of course first I checked whether I get the same error with this other dataset, which was the case.
Running lldb, I got the following:
tckgen: [100%] Creating homogeneous processing mask... tckgen: [100%] Segmenting FODs... tckgen: [100%] Segmenting FODs... tckgen: [ 0%] 1 generated, 0 selectedProcess 30975 stopped
MR::DWI::Tractography::Streamline<float>::operator=(MR::DWI::Tractography::Streamline<float>&&) + 26, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x7fff5f3ffff8) frame #0: 0x00000001000b443a tckgen
MR::DWI::Tractography::StreamlineSorry, this looks messy. Is there a better way to send you the output?
Are you definitely using the most up-to-date code? I did fix an error with MR::DWI::Tractography::Streamline<float>::operator=(MR::DWI::Tractography::Streamline<float>&&)
a couple of weeks ago.
MRtrix 0.3.12-692-g2cbfa942 tckgen Jul 20 2015
That file was updated on Jul 22. Can you please git pull
and recompile, see if that's enough to fix it?
MRtrix 0.3.12-748-gdcace536 tckgen Aug 7 2015
is working :+1: Sorry for opening an issue on a not fully up-to-date version :-/
No worries; I'm just glad it's not something major :-D
Hi, I was trying to run the HCP demostration as outlined here: https://github.com/MRtrix3/mrtrix3/wiki/ISMRM2015-HCP-demonstration Everything worked until the connectome generation.
tckgen output:
tckgen: [100%] resampling ACT 5TT image to fixel image space... tckgen: [100%] Segmenting FODs... tckgen: [100%] Segmenting FODs... tckgen: [ 0%] 1 generated, 0 selectedBus error: 10
Bug? Any idea?
MRtrix 0.3.12-692-g2cbfa942 Compiled on Mac OS X 10.10.4