normal v2 pretrained model not working

EPFL-VILAB / omnidata

A Scalable Pipeline for Making Steerable Multi-Task Mid-Level Vision Datasets from 3D Scans [ICCV 2021]

Other

395 stars 49 forks source link

normal v2 pretrained model not working #53

Open yuyingyeh opened 1 year ago

yuyingyeh commented 1 year ago

Hi! I have tried your code to predict normal using the latest v2 model checkpoint, but the outputs are all NaN. I have tried both the model from the script below and from the google drive. sh ./tools/download_surface_normal_models.sh

The results can be reproduced with this command: python demo.py --task normal --img_path assets/demo/test1.png --output_path assets/

I have uncommended below lines to use v1 model and there is no issue. Could you check your released weight? Thank you! https://github.com/EPFL-VILAB/omnidata/blob/b927c4189ab80077025706ec2308d465171cb417/omnidata_tools/torch/demo.py#L50-L51

jens-nau commented 1 year ago

I have the same problem. Loading the model to the CPU instead of the GPU seems to work, but is very slow. After some testing, I found that the problem only seems to occur on GPUs with certain or perhaps old architectures. On my Ampere-based RTX 3080 everything works fine, but when running the same code on a Pascal-based GTX 1050 Ti the model predicts NaN values.

alexsax commented 1 year ago

Hi! I’ve used the successfully used these weights before on GPU after downloading from Google drive.

How are you using the weights? Have you tried running the demo first and does that work?

On Fri, Jul 21, 2023 at 4:04 AM Jens Naumann @.***> wrote:

I have the same problem. Loading the model to the CPU instead of the GPU seems to work, but is very slow.

— Reply to this email directly, view it on GitHub https://github.com/EPFL-VILAB/omnidata/issues/53#issuecomment-1645403262, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHLE3JLPGF4PEHT2STU733XRJO4BANCNFSM6AAAAAA2PMYDHI . You are receiving this because you are subscribed to this thread.Message ID: @.***>

yuyingyeh commented 1 year ago

I have the same problem. Loading the model to the CPU instead of the GPU seems to work, but is very slow. After some testing, I found that the problem only seems to occur on GPUs with certain or perhaps old architectures. On my Ampere-based RTX 3080 everything works fine, but when running the same code on a Pascal-based GTX 1050 Ti the model predicts NaN values.

Thanks for finding out the problem! I have also tested on another machine and it works!

The command used to test:

cd omnidata/omnidata_tools/torch
python demo.py --task normal --img_path assets/demo/test1.png --output_path assets/

What I have tried:

[Not working] Ubuntu docker + Turing-based RTX 2080 Ti
[Working] Windows 11 + Ada Lovelace-based RTX 4090 + Anaconda

zzt76 commented 10 months ago

I face the same problem when using the normal model:

[WORKING] Win11 + RTX 4060
[NOT WORKING] Ubuntu + V100

Totoro97 commented 7 months ago

I face the same problem

[WORKING] Ubuntu + RTX3090, pytorch 2.0.1, CUDA 11.8 [Not WORKING] Ubuntu + V100, pytorch 2.0.1, CUDA 11.8

haotongl commented 3 months ago

[Not WORKING] Ubuntu + V100, pytorch 2.0.1, CUDA 11.8