Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

Target region failure when using is_device_ptr with class member pointer #41161

Open Quuxplusone opened 5 years ago

Quuxplusone commented 5 years ago
Bugzilla Link PR42191
Status NEW
Importance P normal
Reported by Christopher Daley (csdaley@lbl.gov)
Reported on 2019-06-07 16:36:19 -0700
Last modified on 2019-07-02 15:00:48 -0700
Version unspecified
Hardware PC Linux
CC a.bataev@hotmail.com, csdaley@lbl.gov, jdoerfert@anl.gov, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
Hello all,

I encounter a target region failure when using a class member pointer in the
is_device_ptr clause.

> clang++ -UFUNC_DECL -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda
is_device_ptr.cpp -o is_device_ptr && srun -n 1 ./is_device_ptr
Libomptarget fatal error 1: failure of target construct while offloading is
mandatory
srun: error: cgpu01: task 0: Exited with exit code 1
srun: Terminating job step 156071.85

The target region is successful when I declare the pointer inside the C++
function instead (using the macro FUNC_DECL).

> clang++ -DFUNC_DECL -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda
is_device_ptr.cpp -o is_device_ptr && srun -n 1 ./is_device_ptr
The correct answer is 2....2

I think this is a bug but I am happy to be corrected. The code is as follows.

> cat is_device_ptr.cpp
#include <vector>
#include <iostream>
#include <omp.h>

class Widget {
public:
#ifndef FUNC_DECL
  int _size;
  int _hnum;
  int _dnum;
  double* d_ptr;
  std::vector<double> data;
#endif

  Widget(int size) {
#ifdef FUNC_DECL
    int _size;
    int _hnum;
    int _dnum;
    double* d_ptr;
    std::vector<double> data;
#endif
    _size = size;
    _dnum = omp_get_default_device();
    _hnum = omp_get_initial_device();
    d_ptr = static_cast<double*>(omp_target_alloc(size*sizeof(double), _dnum));
    data.resize(size, 1.0);

    omp_target_memcpy(d_ptr, data.data(), size, 0, 0, _dnum, _hnum);
    double a = 2.0;

#pragma omp target is_device_ptr(d_ptr)
#pragma omp teams distribute parallel for
    for(int i=0; i<size; i++) {
      d_ptr[i] *= a;
    }

    omp_target_memcpy(data.data(), d_ptr, size, 0, 0, _hnum, _dnum);
    omp_target_free(d_ptr, _dnum);
    std::cout << "The correct answer is 2...." << data[0] << '\n';
  }
};

int main() {
  Widget widget(100000);
  return 0;
}

I am using the following version of Clang
> clang++ -v
clang version 9.0.0 (https://github.com/llvm-mirror/clang
c65fb39b929fc1a0ed7bbb93bab40fe53086cbd7) (https://github.com/llvm-mirror/llvm
891c55ccc57a9dd7db74e105bf02e70b3151b9f6)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /project/projectdirs/m1759/csdaley/software/cgpu/llvm/9.0.0-
git_20190606/bin
Found candidate GCC installation: /opt/gcc/7.3.0/snos/lib/gcc/x86_64-suse-
linux/7.3.0
Selected GCC installation: /opt/gcc/7.3.0/snos/lib/gcc/x86_64-suse-linux/7.3.0
Candidate multilib: .;@m64
Selected multilib: .;@m64
Found CUDA installation: /global/common/cori/software/cuda/9.2, version 9.2

Thanks,
Chris
Quuxplusone commented 5 years ago
I think, there is a problem in your code when d_ptr is a member data. You need
to map the whole object and transfer it to the device or allocate memory for it
on the device.
Without it, when you're trying to access d_ptr on the GPU, you're dereferencing
this pointer, which is nullptr (since d_ptr is a member, you need at first to
dereference this pointer to get access to d_ptr).
Quuxplusone commented 5 years ago
(In reply to Alexey Bataev from comment #1)
> I think, there is a problem in your code when d_ptr is a member data. You
> need to map the whole object and transfer it to the device or allocate
> memory for it on the device.
> Without it, when you're trying to access d_ptr on the GPU, you're
> dereferencing this pointer, which is nullptr (since d_ptr is a member, you
> need at first to dereference this pointer to get access to d_ptr).

Thanks Alexey,

Can you please show me the code that should be used when d_ptr is member data?

Is this what you mean?

#pragma omp target enter data map(to:this[0:1])
#pragma omp target is_device_ptr(this->d_ptr)
#pragma omp teams distribute parallel for
    for(int i=0; i<size; i++) {
      d_ptr[i] *= a;
    }

- This code fails in the same way.

Thanks,
Chris
Quuxplusone commented 5 years ago
(In reply to Christopher Daley from comment #2)
> (In reply to Alexey Bataev from comment #1)
> > I think, there is a problem in your code when d_ptr is a member data. You
> > need to map the whole object and transfer it to the device or allocate
> > memory for it on the device.
> > Without it, when you're trying to access d_ptr on the GPU, you're
> > dereferencing this pointer, which is nullptr (since d_ptr is a member, you
> > need at first to dereference this pointer to get access to d_ptr).
>
> Thanks Alexey,
>
> Can you please show me the code that should be used when d_ptr is member
> data?
>
> Is this what you mean?
>
> #pragma omp target enter data map(to:this[0:1])
> #pragma omp target is_device_ptr(this->d_ptr)
> #pragma omp teams distribute parallel for
>     for(int i=0; i<size; i++) {
>       d_ptr[i] *= a;
>     }
>
> - This code fails in the same way.
>
> Thanks,
> Chris

Just this would be enough:
#pragma omp target map(to:this[0:1])
#pragma omp teams distribute parallel for
    for(int i=0; i<size; i++) {
      d_ptr[i] *= a;
    }
Quuxplusone commented 5 years ago
(In reply to Alexey Bataev from comment #3)
> (In reply to Christopher Daley from comment #2)
> > (In reply to Alexey Bataev from comment #1)
> > > I think, there is a problem in your code when d_ptr is a member data. You
> > > need to map the whole object and transfer it to the device or allocate
> > > memory for it on the device.
> > > Without it, when you're trying to access d_ptr on the GPU, you're
> > > dereferencing this pointer, which is nullptr (since d_ptr is a member, you
> > > need at first to dereference this pointer to get access to d_ptr).
> >
> > Thanks Alexey,
> >
> > Can you please show me the code that should be used when d_ptr is member
> > data?
> >
> > Is this what you mean?
> >
> > #pragma omp target enter data map(to:this[0:1])
> > #pragma omp target is_device_ptr(this->d_ptr)
> > #pragma omp teams distribute parallel for
> >     for(int i=0; i<size; i++) {
> >       d_ptr[i] *= a;
> >     }
> >
> > - This code fails in the same way.
> >
> > Thanks,
> > Chris
>
> Just this would be enough:
> #pragma omp target map(to:this[0:1])
> #pragma omp teams distribute parallel for
>     for(int i=0; i<size; i++) {
>       d_ptr[i] *= a;
>     }

Thank you. This works as we expect. Please close this bug report if you are
confident that "is_device_ptr" should not work on member data pointers.

Thanks,
Chris