CUDA error for 256x256x256 models

FXIhub / libspimage

Other

6 stars 3 forks source link

CUDA error for 256x256x256 models #11

Closed IdaLundholm closed 7 years ago

IdaLundholm commented 7 years ago

Hi, There is an issue when trying to phase a 256x256x256 model with support update. I get a "CUDA error "invalid argument" at image_filter_cuda.cu:54" (at sp_gaussian_blur_cuda) at iteration 17 (spimage.sp_support_array_init(spimage.sp_support_area_alloc(blur_radius, support_area), 20)) on both a and c nodes. There is no problem to run with static support. If reducing the size of the input from a cube of 256 to 254 there is no error.

FilipeMaia commented 7 years ago

I'm guessing you're running out of memory.

IdaLundholm commented 7 years ago

But isn't it strange that the error occurs both on c and a nodes then?

cnettel commented 7 years ago

If that's the case, it's still somewhat surprising that it happens at the same iteration on a and c nodes. If it happens in iteration 17, it would be some kind of gradual leak and the different memory sizes would make it more likely to happen at different points in time. (My original guess would have been a memory issue nonetheless.)

FilipeMaia commented 7 years ago

Looking at it more carefully this has to do with limits on the CUDA gridsize, which is limited to 65535, and we're using a gridsize of x*y*z/256, so it blows up at a side of 256. It should be possible to fix this going to 2D grids in the kernel call.

cnettel commented 7 years ago

Makes sense. For compute capability 3.0 and up, I believed the maximum x dimension was 2 billion. It might be possible to just tweak the build settings and leave anything older than Kepler behind...