I'm using nnp_convolution_algorithm_implicit_gemm. Steps:
Call nnp_convolution_inference with nnp_convolution_transform_strategy_precompute to compute how much memory is required for the workspace. Result code is success.
Allocate correct amount of memory for the workspace
Call nnp_convolution_inference again with nnp_convolution_transform_strategy_precompute and a pointer to the workspace. The result code is success.
Call nnp_convolution_inference with nnp_convolution_transform_strategy_reuse, and a pointer to the workspace.
On step 4 I sometimes receive nnp_status_insufficient_buffer. I believe this has something to do with the specific layer configuration I'm passing in. Some layers work fine. I've verified that the workspace sizes I'm passing in match those allocated and requested by the earlier precompute calls.
I think the problem is that nnp_convolution_transform_strategy_precompute allocates/checks against packed_kernel_size, but nnp_convolution_transform_strategy_reuse checks the packed_kernel_size + packed_input_size. Additionally, the computation of packed_kernel_size is done differently in the two locations.
After the transformed kernel is computed in the second call, it should be moved into the kernel argument in the third call to nnp_convolution_inference().
The second call to nnp_convolution_inference() needs the transformed kernel size passed as a pointer, but the third call to nnp_convolution_inference() does not.
I'm using nnp_convolution_algorithm_implicit_gemm. Steps:
On step 4 I sometimes receive nnp_status_insufficient_buffer. I believe this has something to do with the specific layer configuration I'm passing in. Some layers work fine. I've verified that the workspace sizes I'm passing in match those allocated and requested by the earlier precompute calls.
I think the problem is that nnp_convolution_transform_strategy_precompute allocates/checks against packed_kernel_size, but nnp_convolution_transform_strategy_reuse checks the packed_kernel_size + packed_input_size. Additionally, the computation of packed_kernel_size is done differently in the two locations.