NVIDIA / MinkowskiEngine

Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
https://nvidia.github.io/MinkowskiEngine
Other
2.43k stars 360 forks source link

Support for H100? (TORCH_CUDA_ARCH_LIST=9.0) #588

Closed jkim50104 closed 1 month ago

jkim50104 commented 4 months ago

I'm trying to run the Minkowski engine in the H100 GPU. However, H100 has TORCH_CUDA_ARCH_LIST=9.0, and it supports from cuda 11.8. I tried running in every pytorch >= 2.0.0, cuda 11.8. I've seen environments working in this pytorch and CUDA version. I assume it is H100's compatibility problem. Did anyone succeed in this setting and GPU?

It compiles well but always asserts this error when I forward in run time.

assertion (in_feat.size(1) == kernel.size(1)) failed. Input feature size and kernel size mismatch

It can be reproduced in the H100 setting with any environment forwarding code.

zjwzcx commented 1 month ago

Same. Have you ever solved this issue?

jkim50104 commented 1 month ago

Hello, I did solve this issue, but it was a long time ago, so I forgot the exact issue that was causing it sorry. However, I remember that it was not the H100 compatibility problem. It still works in my H100 environment.

D3xter1922 commented 2 weeks ago

Did you find any solution for this issue? I am facing the same issue @jkim50104