daducci / COMMIT

Linear framework to combine tractography and tissue micro-structure estimation with diffusion MRI
Other
45 stars 33 forks source link

Files in the dictionary bigger than 2GB #3

Closed daducci closed 2 years ago

daducci commented 9 years ago

I report in the following an excerpt from the email I received from a user (Stefan Sommer) about an issue (and possible solution) with files larger than 2GB:

Hi Ale,

I figured my trk2dictionary on windows (the python compiled c-code) has troubles with track-files >2 GB. The problem is buried in the fseek function and some internally used offset variable (I think 32bit int signed, that would make sense for the 2 GB). Maybe on a different system with another compiler this might not be an issue (32/64bit, even though I am on a 64bit system but maybe it's compiled 32bit? I think I am using the visual studio compiler on windows) but it might be worth a check on other systems if that is an issue.

Anyway here is my quick fix using a mostly undocumented modified fseek:

  • include this line in the top part of trk2dictionary_c.cpp: extern "C" int __cdecl _fseeki64(FILE *, __int64, int);
  • substitute the two fseek calls in the read_fiber() function at the end of the file with _fseeki64() (there are two calls like that): _fseeki64(fp,4*ns,SEEK_CUR);

Just wanted to inform you if somebody ran into similar troubles or will in the future.

trk2dctionary just hang for "no reason" but I couldn't identify a malicious fiber. There is a while loop in fiberForwardModel() where the segments are checked and splitted for voxel intersection and after 2GB the file-offset variable probably wrapped around to -2^16 reading any non-sense memory and interpret those values as segment coordinates, typically almost +/- inf or 0. So in the extreme cases, the segments were just split forever and the while loop never terminated. There's a break condition for the segment length if it's <1e-3, maybe it's a good idea to also limit the upper segment length in case of some very wrong fiber coordinate. In the >2GB case this leads to a segmentation fault and "at least" python crashes (caused by a segmentation fault I guess). Not sure what solution is "better" in that case though ;-)

daducci commented 9 years ago

Thanks a lot Stefan!

I'm not an expert in C, but I found here a solution that may be more portable. It seems sufficient to tell the compiler to automatically use the functions fread64/fseek64 etc by specifying the flag -D_FILE_OFFSET_BITS=64. I need to test it and see if this solves the problem.

By the way, the 2GB limit is not directly related to the size of the input tractogram, but to the dictionary_*.dict files generated while building the linear operator. In fact, a tractogram could be bigger than 2GB and the generated files smaller than 2GB, but also the other way around can actually happen.