Closed rmjarvis closed 1 year ago
Had a look at this today but I think I will have to look into it more deeply next week. I fixed a few minor problems (a stray comma in line 1134, missing targetData
pointer in addDelta
) and was able to get it to build for GPU, but it's crashing when running the test suite. I'm seeing a case (specifically, the 4th time initialize
gets called during test_silicon
) where the pointer returned by getMaxPtr()
is less than the one returned by getData()
, so nmem
is negative and GPU memory allocation fails. I will need to rethink how the allocation is handled and come up with a method that works for all cases (including non-contiguous cases).
OK, I have this working on GPU now. Essentially the entire memory range of the image (at least the part that is used in the current view) needs to be mapped to GPU, so it's necessary to work out the minimum and maximum pointers that can ever be accessed, then map the range between them. The same range needs to be used when updating the target image on the host from the GPU, and also when removing the mappings at the end.
The working code is here: https://github.com/jamesp-epcc/GalSim/tree/silicon_segfault . I have modified Silicon.cpp
and Silicon.h
but nothing else.
Those changes look fine to me. If the unit tests are passing on the GPU, you should merge them into this branch.
Yes, the full sensor test suite passes on GPU with these changes. I have merged them into this branch.
@jamesp-epcc I'm assuming you already looked through the code here to make your changes, but if you could do a code review of this, that would be great!
I'd looked through most of it in the course of my debugging, but I just had a look over the whole thing. The only thing I noticed is that subtractDelta
no longer copies the data back from the GPU like addData
does. This doesn't cause any test failures just now so it may not strictly be needed, but it seems sensible to keep the behaviour consistent in case it makes a difference later on.
Good catch. Thanks!
@cwwalter was getting occasional segmentation faults using GalSim main to do some imSim work, stemming from the Silicon class, which this PR partially fixes. (It fixes them for CPU runs, but not completely for GPU runs.)
It turns out there were two serious problems with the current implementation related to calculating the number of pixels to work on:
int imageDataSize = _delta.getMaxPtr() - _delta.getData();
. This is sometimes correct, but it's not reliable, since_delta
gets resized from time to time. When the size decreases, we don't deallocate and reallocate. SogetMaxPtr()
can be much larger than the number of pixels currently being worked on. This was the case in what Chris was doing, so some loops ended up writing many more values than the image stored, overrunning the allocated data. Hence SIGSEGV.I fixed both problems for CPU execution. But I'm pretty confident that some of the GPU syncing is still not correct when the target image is not contiguous. I marked these with FIXME comments.
@jamesp-epcc Can you please fix those remaining GPU commands to make them work when step!=1? And especially when step or stride < 0, which seems particularly tricky to get right. I don't have easy access to a GPU machine to test on.