ShichenLiu / SoftRas

Project page of paper "Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning"
MIT License
1.24k stars 156 forks source link

Backward Loss function stops the program without any error or warning log on Windows #48

Closed MdotO closed 4 years ago

MdotO commented 4 years ago

Hi, I wanted to point out this rather strange bug/issue when the program executes line 124 ,i.e loss.backward() function. The program sits there for a while like 0.5 seconds or so and then the terminal ,without any sort of prompt or anything , stops execution as if the program has completed successfully without any errors even though the program has not even completed 1 iteration. I tried this on my custom dataset and then used the mesh reconstrction.zip dataset , basically the dataset you described in the Readme and I got the same result, the program stops. I have no idea what could be causing it. I tried just checking whether the backward function works or not on a random sample tensor and in that case it did work. Infact, I just used it on a different Machine learning based project which uses pytorch. The backward function is a method of Tensor class so I doubt there is any link to the soft_renderer related modules which were installed. I tried checking anywhere else on the internet and the closest I got to this issue was a when a person a forum mentioned this and then replied himself after a while that this error or rather behavior was exhibited on Windows. I would really appreciate if you could provide any insight into this matter. For anyone using this repo, did they try it on Windows ? Many Thanks!

MdotO commented 4 years ago

For anyone else who might have wandered on to this strange but probably rare issue, I solved it by simply installing pytorch version 10+. My current version 9.1 didn't work when any backward function was computed on it in CUDA ,worked fine for CPU. About the previous project in which I claimed it worked, I found that I was actually using torch version 10.1. So here, for Windows, I had use torch 10.1 version. Pip installing the latest version (10.2), gets automatically downgraded to 9.1/9.2 which may have some compatibility issues with my current GPU drivers and thus resulted in this problem. So basically installing torch 10.1 works because pip doesn't downgrade 10.1 even though it downgrades 10.2 . Finally, you need to install Nvidia GPU Computing Toolkit too alongside this and make sure it matches the torch version , i.e 10.1 not 10.2