CUDA/PyTorch Version Issues

rlee3359 commented 7 months ago

Hi, thanks for the great work and sharing the code!

I'm trying to run the demos but having some trouble installing. I followed the install instructions to install with conda. However, I'm having a CUFFT_INTERNAL_ERROR when trying to get the features:

x_freq = fft.fftn(x, dim=(-2, -1))

RuntimeError: cuFFT error: CUFFT_INTERNAL_ERROR

This seems to be caused by a CUDA bug in 1.6, as described here: https://github.com/pytorch/pytorch/issues/88038

Upgrading to pytorch 2.0 doesn't seem like an option since it's not compatible with detectron2, but pytorch 1.13.1 isn't compatible with CUDA 1.8 either.

Any ideas on how to resolve this?

Also, it might be worth adding a few lines to the installation instructions that cover Mask2Former installing in the third_party directory.

Thanks again, and I appreciate any help you can provide!

Junyi42 commented 7 months ago

Hi,

Sorry for the late reply and thanks for your suggestions. I tested the installation process on my device with RTX3090s and it went fine. Could you please elaborate a bit more on the problem you met?

According to the information you shared, I would suggest you check where the line x_freq = fft.fftn(x, dim=(-2, -1)) came up and see if there is a way to bypass it.

Besides, another alternative way is to use the DIFT implementation for SD feature extractor to replace the ODISE's version. While this approach might give slightly worse result, it can bypass the installation of ODISE (and also the Mask2Former and detectron).

Best, Junyi

thanhnguyentung95 commented 7 months ago

Install torch==2.0.0 then rebuild detectron2, mask2former worked for me.

feixue94 commented 2 months ago

Hi,

Sorry for the late reply and thanks for your suggestions. I tested the installation process on my device with RTX3090s and it went fine. Could you please elaborate a bit more on the problem you met?

According to the information you shared, I would suggest you check where the line x_freq = fft.fftn(x, dim=(-2, -1)) came up and see if there is a way to bypass it.

Besides, another alternative way is to use the DIFT implementation for SD feature extractor to replace the ODISE's version. While this approach might give slightly worse result, it can bypass the installation of ODISE (and also the Mask2Former and detectron).

Best, Junyi

Thanks for your updating. I am curious why DIFT's implementation for SD extract gives worse performance. DIFT uses SDv2 which could be a more powerful backbone than SDv1.5

Junyi42 commented 2 months ago

Hi, Sorry for the late reply and thanks for your suggestions. I tested the installation process on my device with RTX3090s and it went fine. Could you please elaborate a bit more on the problem you met? According to the information you shared, I would suggest you check where the line x_freq = fft.fftn(x, dim=(-2, -1)) came up and see if there is a way to bypass it. Besides, another alternative way is to use the DIFT implementation for SD feature extractor to replace the ODISE's version. While this approach might give slightly worse result, it can bypass the installation of ODISE (and also the Mask2Former and detectron). Best, Junyi

Thanks for your updating. I am curious why DIFT's implementation for SD extract gives worse performance. DIFT uses SDv2 which could be a more powerful backbone than SDv1.5

Hi,

Thanks for the inquiry. The rough performance comparison is based on the reported number of the dift and tale-of-two-features paper. I assume the improvement is mainly due to 1) sd-dino is extracted from resolution 960^2, which is larger than dift; 2) sd-dino's sd feature is extracted with the implicit captioner from ODISE, which is different than the NULL text embedding in dift.

anas-zafar commented 2 months ago

@thanhnguyentung95 which CUDA version did you use? @feixue94 were you able to fix this issue?

Junyi42 commented 2 months ago

@thanhnguyentung95 which CUDA version did you use? @feixue94 were you able to fix this issue?

Hi @anas-zafar,

I did a further check and found that I had mistakenly included a testing script in the repo.

Please set the line https://github.com/Junyi42/GeoAware-SC/blob/master/third_party/ODISE/odise/modeling/meta_arch/ldm.py#L436 from self.freeu = True to self.freeu = False, and rebuilt the repo with setup.py at the root directory, and the problem should be fixed.

Please let me know if that works, thanks!

anas-zafar commented 1 month ago

Thanks @Junyi42 , it worked.

Junyi42 / GeoAware-SC

CUDA/PyTorch Version Issues #1