bmild / nerf

Code release for NeRF (Neural Radiance Fields)
http://tancik.com/nerf
MIT License
9.57k stars 1.34k forks source link

Hanging on TF operation #94

Open rlangefe opened 3 years ago

rlangefe commented 3 years ago

I was running this on a cluster with a V100 GPU and 16 CPUs. It ran all the way through for the fern demo, but when we took our own data and tried to run after generating poses from the images as directed, the execution seems to hang here: https://github.com/bmild/nerf/blob/20a91e764a28816ee2234fcadb73bd59a613a44c/run_nerf.py#L606-L609 It stops execution once it hits the first tf.reduce_min statement. Is there something off about the files or is this a software-side issue? Any help would be appreciated

kondela commented 3 years ago

I had a similar problem although using different implementation, see https://github.com/kwea123/nerf_pl/issues/49

Most likely it’s due to your poses.

rlangefe commented 3 years ago

Can you elaborate on that? I was also wondering if I might have compiled incorrectly, because even though CUDA is loaded and I thought it compiled Colmap with CUDA support, it tells me SiftGPU isn't supported by my hardware

kondela commented 3 years ago

You can check whether you compiled COLMAP with CUDA support by typing colmap in terminal, you should see something along:

COLMAP 3.7 -- Structure-from-Motion and Multi-View Stereo
              (Commit 0aea04c on 2020-12-11 with CUDA)

However, there shouldn't be a large difference in performance (not execution speed) between SiftGPU and regular Sift.

Try to debug the code and pinpoint variable that hangs the execution, in my case that variable had and inf value which caused the hanging. There are multiple reasons for inf, most probably there's a problem with your camera extrinsics and intrinsics.