Open PhilipFackler opened 3 months ago
@PhilipFackler does this work for your use case?
Test this please
@williamfgc It worked for the smallish test case I was using for development (one that caused the failure in the first place). I still want to test it out on iguazu with a bigger case.
@PhilipFackler thanks, let me know when this is ready to merge.
@williamfgc I added functors to change the behavior of getting i
and j
for the kernel function. This solves the problem for my case. However it would be nice to implement a BlockIndexer
that used sub-ranges for when the problem size would cause the number of blocks to exceed the maximum.
@williamfgc I believe this is ready to merge. You'll want to request all the tests again.
@williamfgc can you merge this?
Similar to #76 but for 2D. This only applies to the CUDA version.