facebookresearch / pytorch3d

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
https://pytorch3d.org/
Other
8.7k stars 1.3k forks source link

CUDA 12.3/PT nightly incompatibility #1680

Closed d4l3k closed 9 months ago

d4l3k commented 11 months ago

If you do not know the root cause of the problem / bug, and wish someone to help you, please post according to this template:

🐛 Bugs / Unexpected behaviors

NOTE: Please look at the existing list of Issues tagged with the label 'bug`. Only open a new issue if this bug has not already been reported. If an issue already exists, please comment there instead..

I'm getting ambiguous function errors for the function make_float3. It's unclear whether this is an issue from upgrading CUDA to 12.3 or due to PyTorch nightly. Commenting out make_float3 in pulsar/globals.h solves the issue.

Instructions To Reproduce the Issue:

Please include the following (depending on what the issue is):

Install CUDA 12.3 and install pytorch nightly

  1. Any changes you made (git diff) or code you wrote no code changes

  2. The exact command(s) you ran:

    pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu121
    python setup.py develop
  3. What you observed (including the full logs): https://gist.github.com/d4l3k/488b170becacbb8510644152347c1382

      In file included from /tmp/pip-req-build-y8fv0d6c/pytorch3d/csrc/pulsar/host/../include/./renderer.norm_sphere_gradients.de
vice.h:14:
      /tmp/pip-req-build-y8fv0d6c/pytorch3d/csrc/pulsar/host/../include/././math.h: In function ‘float3 outer_product_sum(const f
loat3&)’:
      /tmp/pip-req-build-y8fv0d6c/pytorch3d/csrc/pulsar/host/../include/././math.h:42:21: error: call of overloaded ‘make_float3(
float, float, float)’ is ambiguous
         42 |   return make_float3(
            |          ~~~~~~~~~~~^
         43 |       a.x * a.x + a.x * a.y + a.x * a.z,
            |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         44 |       a.x * a.y + a.y * a.y + a.y * a.z,
            |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
         45 |       a.x * a.z + a.y * a.z + a.z * a.z);
            |       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      In file included from /opt/cuda/include/vector_functions.h:176,
                       from /opt/cuda/include/cuda_fp16.h:131,
                       from /opt/cuda/include/cusparse.h:59,
                       from /home/rice/venvs/torchdrive3.11/lib/python3.11/site-packages/torch/include/ATen/cuda/CUDAContext.h:6,
                       from /tmp/pip-req-build-y8fv0d6c/pytorch3d/csrc/pulsar/host/../include/./../global.h:51:
      /opt/cuda/include/vector_functions.hpp:243:34: note: candidate: ‘float3 make_float3(float, float, float)’
        243 | __VECTOR_FUNCTIONS_DECL__ float3 make_float3(float x, float y, float z)
            |                                  ^~~~~~~~~~~
      /tmp/pip-req-build-y8fv0d6c/pytorch3d/csrc/pulsar/host/../include/./../global.h:70:15: note: candidate: ‘float3 make_float3
(const float&, const float&, const float&)’
         70 | inline float3 make_float3(const float& x, const float& y, const float& z) {
            |               ^~~~~~~~~~~

Please also simplify the steps as much as possible so they do not require additional resources to run, such as a private dataset.

bottler commented 11 months ago

I think it must be the new cuda because our internal builds use PyTorch close to the bleeding edge and are OK.

If you can easily send a pr which keeps it working both ways (eg rename our function or move it to a namespace) that would be great.

kynk94 commented 10 months ago

Comment out the place where the make_float3 function is defined in pytorch3d/csrc/pulsar/global.h, you can install pytorch3d normally.

bottler commented 9 months ago

Now fixed.

zxhuang97 commented 7 months ago

Looks like this function is added back in commit 3621a36, and I'm experiencing the same issue again with CUDA 12.3

sjuxax commented 6 months ago

+1, also getting this with CUDA 12.3.

EDIT: 6b8766080d2c331a05abbddbf3c7332dbb9df791 builds OK.

bottler commented 6 months ago

Looks like this function is added back in commit 3621a36, and I'm experiencing the same issue again with CUDA 12.3

It was added back inside a block for WITH_CUDA not being defined, for CPU-only builds. I don't know exactly how it's causing problems for you in cuda 12.3 builds. Someone might be able to suggest the right fix, checking it works on cuda 12.3 and a CPU-only build?