facebookresearch / pytorch3d

PyTorch3D is FAIR's library of reusable components for deep learning with 3D data
https://pytorch3d.org/
Other
8.81k stars 1.32k forks source link

Faces list creation does not work on MacOS #1802

Open mxrcooo opened 5 months ago

mxrcooo commented 5 months ago

🐛 Bugs / Unexpected behaviors

I'm trying to load an obj file (on MacOS) using load_obj and create a Meshes instance:

mesh_pytorch3d = Meshes(verts=[verts], faces=[faces], textures=textures)

The code gets stuck here and after profiling I found that this line is what causes it to get stuck: https://github.com/facebookresearch/pytorch3d/blob/main/pytorch3d/structures/meshes.py#L347

It loads fine for the same object on an Ubuntu server with a GPU but it doesn't work on my MacBook (M3 Pro Max). I installed Pytorch3D following the INSTALL.md steps with: MACOSX_DEPLOYMENT_TARGET=10.14 CC=clang CXX=clang++ pip install "git+https://github.com/facebookresearch/pytorch3d.git"

The following debug print also times out:

print("test:", faces[0][faces[0].gt(-1).all(1)])

The obj file is tr_reg_000_for_illustration from the FAUST dataset.

I'm not sure if this issue is Pytorch3D related or torch related. I think the issue is the mask access f[mask]. When rewriting the logic to this, it works:

self._faces_list = []

for f in faces:
    if len(f) > 0:
        valid_rows = []
        for row in f:
            if torch.all(row > -1):
                valid_rows.append(row)
        if valid_rows:
            filtered_f = torch.stack(valid_rows).to(torch.int64)
        else:
            filtered_f = torch.tensor([]).to(f.dtype)
    else:
        filtered_f = f

    self._faces_list.append(filtered_f)
bottler commented 5 months ago

Could you be running out of memory on the macbook? I think your new code uses less temporary memory than the original. It's great that you are unblocked, and thanks for posting the workaround.

mxrcooo commented 5 months ago

Doesn't seem to be the case, I have plenty of RAM left and the CPU is also mostly idle. I've checked and the tensors are definitely on the CPU. The weird thing is that if I use the same tensor content and perform the same operation in a separate script it works just fine. Not sure what could cause this