NumPower / numpower

PHP extension for efficient scientific computing and array manipulation with GPU support
https://numpower.org
Other
185 stars 4 forks source link

NDArray_Broadcast implicitly requires GPU / HAVE_CUBLAS #25

Closed markkimsal closed 1 year ago

markkimsal commented 1 year ago

Everytime NDArray_VMEMCPY_D2D is referenced in ndarray.c it is wrapped in ifdef HAVE_CUBLAS check except in broadcast.

This results in an implicit function declaration warning

/app/numpower/numpower-main/src/ndarray.c: In function 'NDArray_Broadcast':
/app/numpower/numpower-main/src/ndarray.c:968:21: warning: implicit declaration of function 'NDArray_VMEMCPY_D2D' [-Wimplicit-function-declaration]
  968 |                     NDArray_VMEMCPY_D2D(NDArray_DATA(src), rtn_p,
      |                     ^~~~~~~~~~~~~~~~~~~

Is it correct to simply wrap this call in a check for HAVE_CUBLAS?

    if (!NDArray_IsBroadcastable(src, dst)) {
        zend_throw_error(NULL, "Broadcast shape mismatch.");
        return NULL;
    }
    rtn = NDArray_Copy(dst, NDArray_DEVICE(dst));
    char *rtn_p = NDArray_DATA(rtn);
    if (NDArray_NDIM(src) == 1 && NDArray_NDIM(dst) > 1) {
        if (NDArray_SHAPE(src)[0] == NDArray_SHAPE(dst)[NDArray_NDIM(dst) - 2]) {
            if (NDArray_DEVICE(dst) == NDARRAY_DEVICE_CPU) {
                for (i = 0; i < NDArray_SHAPE(dst)[NDArray_NDIM(dst) - 2]; i++) {
                    memcpy(rtn_p,
                           NDArray_FDATA(src), sizeof(float) * NDArray_SHAPE(dst)[NDArray_NDIM(dst) - 1]);
                    rtn_p = rtn_p + (sizeof(float) * NDArray_SHAPE(src)[0]);
                }
            }
#ifdef HAVE_CUBLAS
            if (NDArray_DEVICE(dst) == NDARRAY_DEVICE_GPU) {
                for (i = 0; i < NDArray_SHAPE(dst)[NDArray_NDIM(dst) - 2]; i++) {
                    NDArray_VMEMCPY_D2D(NDArray_DATA(src), rtn_p,
                                        sizeof(float) * NDArray_SHAPE(dst)[NDArray_NDIM(dst) - 1]);
                    rtn_p = rtn_p + (sizeof(float) * NDArray_SHAPE(src)[0]);
                }
            }
#endif
        }
henrique-borba commented 1 year ago

Hey @markkimsal, yes it is. That way we avoid these GPU checks when it will never be used.

I hope a CI for the Pull Requests will help me avoid these mistakes.

Thanks for the report ✌🏼

henrique-borba commented 1 year ago

I'm closing the issue now that the MR has been sent. Thanks for the contribution.