facebookresearch / NSVF

Open source code for the paper of Neural Sparse Voxel Fields.
MIT License
800 stars 92 forks source link

About apex and the args "--fp16" #33

Closed yumi-cn closed 3 years ago

yumi-cn commented 3 years ago

I already install the nvidia/apex module in my env(which is optional said in your project README).

When I try to add args "--fp16" to the train script:

python -u train.py ${DATASET} \
    ... \
    --fp16 \
    ... \
    --tensorboard-logdir ${SAVE}/tensorboard \
    | tee -a $SAVE/train.log

It will occur some errors, the main Error Report is about c10:Error

...
terminate called after throwing an instance of 'c10::Error'
...

Something similar to fairsep issue#1683 - closed&no response

I try to find ways to solve this, like add args "--ddp-backend=no_c10d",but this just cause the same error.

I haven't read all the main codes of project, but I guess you guys maybe more familiar with these problem, so I try to post this issue.

Thanks for replying.

BTW:train without "--fp16" is always fine, and the env is almost the same as the requirement file in README.

MultiPath commented 3 years ago

Hi, I am sorry for replying late as I was busy with other things. --fp16 (mixed precision training) only works for certain GPUs such as Nvidia V100. It will help to reduce GPU usage. Maybe your GPU did not support that?

yumi-cn commented 3 years ago

Hi, I am sorry for replying late as I was busy with other things. --fp16 (mixed precision training) only works for certain GPUs such as Nvidia V100. It will help to reduce GPU usage. Maybe your GPU did not support that?

My GPUs are RTX2080Ti(11GB) x 4 in the server docker env, which I check, it is Turning Arch and has Tensor Core support.

Maybe I use the --fp16 in a wrong way in the command? or other env setting problem, confusing.

MultiPath commented 3 years ago

I will check --fp16 recently. I think it should work as I always use fp16 in my early experiments. However, I am afraid it may cause inaccurate rendering results, so I usually turned that off later.