Closed IdaLundholm closed 7 years ago
I'm guessing you're running out of memory.
But isn't it strange that the error occurs both on c and a nodes then?
If that's the case, it's still somewhat surprising that it happens at the same iteration on a and c nodes. If it happens in iteration 17, it would be some kind of gradual leak and the different memory sizes would make it more likely to happen at different points in time. (My original guess would have been a memory issue nonetheless.)
Looking at it more carefully this has to do with limits on the CUDA gridsize, which is limited to 65535, and we're using a gridsize of x*y*z/256
, so it blows up at a side of 256.
It should be possible to fix this going to 2D grids in the kernel call.
Makes sense. For compute capability 3.0 and up, I believed the maximum x dimension was 2 billion. It might be possible to just tweak the build settings and leave anything older than Kepler behind...
Hi, There is an issue when trying to phase a 256x256x256 model with support update. I get a "CUDA error "invalid argument" at image_filter_cuda.cu:54" (at sp_gaussian_blur_cuda) at iteration 17 (spimage.sp_support_array_init(spimage.sp_support_area_alloc(blur_radius, support_area), 20)) on both a and c nodes. There is no problem to run with static support. If reducing the size of the input from a cube of 256 to 254 there is no error.