tckgen selectedBus error

isdneuroimaging commented 9 years ago

Hi, I was trying to run the HCP demostration as outlined here: https://github.com/MRtrix3/mrtrix3/wiki/ISMRM2015-HCP-demonstration Everything worked until the connectome generation.

tckgen output:

tckgen: [100%] resampling ACT 5TT image to fixel image space... tckgen: [100%] Segmenting FODs... tckgen: [100%] Segmenting FODs... tckgen: [ 0%] 1 generated, 0 selectedBus error: 10

Bug? Any idea?

MRtrix 0.3.12-692-g2cbfa942 Compiled on Mac OS X 10.10.4

jdtournier commented 9 years ago

OK, this is a new one... Bus errors are really difficult to get a proper handle on, I've never quite figured out exactly what they refer to. But the only other context I've ever had such issues was when trying to implement incremental reading of a track file while the file was being actively written to. At that time, I'd convinced myself that the bus error actually referred to attempts to read parts of a file that should theoretically be readable, but just happen not to have been initialised by the kernel yet - i.e. trying to read right after the end-of-file pointer is updated, but before the memory backing the new file segment is available.

But I can't see how this would relate to what you're getting - especially since we've never seen this kind of problem in any recent version of the code (by recent, I mean within the last ~8 years). My first question would relate to the filesystem that you're writing to - maybe you're using a network-based filesystem of something, and there's a funny bug in it? The other question is whether this is reproducible? It looks like it should be, given that it fails right at the first track. Maybe you can try adding the -nthread 0 option to disable multi-threading? Maybe you can try writing to an output in a different location? If any of these change the outcome, that would be useful information...

Lestropie commented 9 years ago

Another variable to experiment with is using a standard mask-image-based seeding rather than dynamic seeding, just in case that's doing something weird.

jdtournier commented 9 years ago

Good point, that's actually not unlikely to be the issue. The dynamic seeding relies on updating the TDI (TOD?) map in real time while all the other threads are reading from it, there could be some subtle race conditions there. And if the TDI/TOD map is stored using some kind of memory-mapped file (is it?), that could explain the bus error. In fact, it seems strongly related to a recent email exchange we had about using the spare format in multi-threaded applications... Could it have something to do with that...?

Lestropie commented 9 years ago

Fixel TDI. It's not memory-mapped, all internal in RAM, nothing to do with the sparse format.

Track density in each fixel is a std::atomic, and I use a std::atomic_flag to prevent concurrent updates, so shouldn't be a problem. But Murphy hasn't been particularly kind to me lately, so it's worth testing.

isdneuroimaging commented 9 years ago

Great suggestions!

I can confirm that this occurs only with the dynamic seeding. Mask-image-based seeding works, at least with 10M tracks, with 100M it stalls at 0 for some, probably unrelated reason. But because of this observation, I also tried dynamic seeding with fewer tracks (1000), again with bus error.

Single-threaded (-nthread 0): Uh, resampling and segmenting was really slow ;) Interestingly: Different error:

tckgen: [100%] resampling ACT 5TT image to fixel image space... tckgen: [100%] Segmenting FODs... tckgen: [100%] Segmenting FODs... tckgen: [ 0%] 1 generated, 1 selectedSegmentation fault: 11

Filesystem: This was on the local SSD. I tried again on another local hard drive: Bus error.

Let me know if I can test something else! Happy to help.

Lestropie commented 9 years ago

Yuck... looks like this is going to be a nasty one to hunt down. Once we're done we should also look into why it's stalling when requesting 100M tracks: it may be related, it might not, but it's still not something that should be causing you problems.

Next step is to run the command in debug mode. You will need to configure and compile the code in debug mode (this won't affect your existing binaries):

./configure -debug debug
./build debug

You will also need to install GDB through whichever avenue is appropriate. Then, re-run the command, but replace:

tckgen

with:

gdb --args tckgen__debug.

Probably best to run it in single-threaded mode too: it will make things easier to navigate once we get to the crash point, though it will take a long time to get there. When GDB starts, hit 'r' to start the program running. Once it crashes, type 'bt' (backtrace), and post the output here. If possible, leave the terminal in that state after the fact, just in case there are additional things we can query using GDB to hunt down the problem.

Can you also let us know your compiler version: I know 4.7 has incomplete support of C++11, there's a remote chance the use of atomic functions in the dynamic seeder is causing problems.

jdtournier commented 9 years ago

This is on MacOSX, so compiler would be clang, with complete C++11 support - that shouldn't be the problem. gdb is no longer available on MacOSX, use lldb instead. Other than that, the instructions above should hopefully work...

isdneuroimaging commented 9 years ago

Sorry guys, never worked with lldb before and it looks complicated from what I found on the web ;) Could you please help me how to run tckgen__debug with lldb? Substituting gdb by lldb with the arguments provided above does not work.

Lestropie commented 9 years ago

Never used it myself either. But looking at the documentation, I think it's just a matter of changing:

gdb --args tckgen__debug ...

with:

lldb -- tckgen__debug ...

isdneuroimaging commented 9 years ago

OK, I omitted the -act option and used a lower resolution dataset, because it took ages to calculate within lldb. Of course first I checked whether I get the same error with this other dataset, which was the case.

Running lldb, I got the following:

tckgen: [100%] Creating homogeneous processing mask... tckgen: [100%] Segmenting FODs... tckgen: [100%] Segmenting FODs... tckgen: [ 0%] 1 generated, 0 selectedProcess 30975 stopped

thread #1: tid = 0x2e5700, 0x00000001000b443a tckgenMR::DWI::Tractography::Streamline<float>::operator=(MR::DWI::Tractography::Streamline<float>&&) + 26, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=2, address=0x7fff5f3ffff8) frame #0: 0x00000001000b443a tckgenMR::DWI::Tractography::Streamline::operator=(MR::DWI::Tractography::Streamline&&) + 26 tckgen`MR::DWI::Tractography::Streamline::operator=: -> 0x1000b443a <+26>: callq 0x1000c50b0 ; symbol stub for: std::1::vectorMR::Point<float, std::1::allocatorMR::Point >::vector(std::1::vectorMR::Point<float, std::1::allocatorMR::Point > const&) 0x1000b443f <+31>: leaq -0x40(%rbp), %rdi 0x1000b4443 <+35>: movq %r15, %rsi 0x1000b4446 <+38>: callq 0x1000c50b0 ; symbol stub for: std::1::vectorMR::Point<float, std::1::allocatorMR::Point >::vector(std::1::vectorMR::Point<float, std::1::allocatorMR::Point > const&)

Sorry, this looks messy. Is there a better way to send you the output?

Lestropie commented 9 years ago

Are you definitely using the most up-to-date code? I did fix an error with MR::DWI::Tractography::Streamline<float>::operator=(MR::DWI::Tractography::Streamline<float>&&) a couple of weeks ago.

isdneuroimaging commented 9 years ago

MRtrix 0.3.12-692-g2cbfa942 tckgen Jul 20 2015

Lestropie commented 9 years ago

That file was updated on Jul 22. Can you please git pull and recompile, see if that's enough to fix it?

isdneuroimaging commented 9 years ago

MRtrix 0.3.12-748-gdcace536 tckgen Aug 7 2015 is working :+1: Sorry for opening an issue on a not fully up-to-date version :-/

Lestropie commented 9 years ago

No worries; I'm just glad it's not something major :-D

MRtrix3 / mrtrix3

tckgen selectedBus error #318