Cameras with multiple overlapping regions: Will it work?

samhodge-aiml commented 1 year ago

I have a series of photos: https://drive.google.com/drive/folders/1ZZgZUrFrnP47rx8bN5K6yvYnSC50a-9G?usp=drive_link

Which were take with an iPhone 13 Pro Max

I have used this dataset with Instant NGP from NVIDIA and with Gaussian Splatting to produce a good radiance field.

Do you think this dataset will work with the code in this repository.

My changes are recorded here

diff --git a/data/iphone.py b/data/iphone.py
index 05cf1d5..e34bcc8 100644
--- a/data/iphone.py
+++ b/data/iphone.py
@@ -17,7 +17,7 @@ from util import log,debug
 class Dataset(base.Dataset):

     def __init__(self,opt,split="train",subset=None):
-        self.raw_H,self.raw_W = 1080,1920
+        self.raw_H,self.raw_W = 3024,4032
         super().__init__(opt,split)
         self.root = opt.data.root or "data/iphone"
         self.path = "{}/{}".format(self.root,opt.data.scene)
@@ -62,7 +62,7 @@ class Dataset(base.Dataset):
         return image

     def get_camera(self,opt,idx):
-        self.focal = self.raw_W*4.2/(12.8/2.55)
+        self.focal = self.raw_W*1.6*35.0
         intr = torch.tensor([[self.focal,0,self.raw_W/2],
                              [0,self.focal,self.raw_H/2],
                              [0,0,1]]).float()
diff --git a/options/barf_iphone.yaml b/options/barf_iphone.yaml
index f344c7b..fbfcc38 100644
--- a/options/barf_iphone.yaml
+++ b/options/barf_iphone.yaml
@@ -1,6 +1,19 @@
-_parent_: options/barf_llff.yaml
+_parent_: options/nerf_iphone.yaml

-data:                                                       # data options
-    dataset: iphone                                         # dataset name
-    scene: IMG_0239                                         # scene name
-    image_size: [480,640]                                   # input image sizes [height,width]
+barf_c2f:                                                   # coarse-to-fine scheduling on positional encoding
+
+camera:                                                     # camera options
+    noise:                                                  # synthetic perturbations on the camera poses (Blender only)
+
+optim:                                                      # optimization options
+    lr_pose: 3.e-3                                          # learning rate of camera poses
+    lr_pose_end: 1.e-5                                      # terminal learning rate of camera poses (only used with sched_pose.type=ExponentialLR)
+    sched_pose:                                             # learning rate scheduling options
+        type: ExponentialLR                                 # scheduler (see PyTorch doc)
+        gamma:                                              # decay rate (can be empty if lr_pose_end were specified)
+    warmup_pose:                                            # linear warmup of the pose learning rate (N iterations)
+    test_photo: true                                        # test-time photometric optimization for evaluation
+    test_iter: 100                                          # number of iterations for test-time optimization
+
+visdom:                                                     # Visdom options
+    cam_depth: 0.2                                          # size of visualized cameras
diff --git a/requirements.yaml b/requirements.yaml
index 0baf8b0..2865db4 100644
--- a/requirements.yaml
+++ b/requirements.yaml
@@ -2,6 +2,7 @@ name: barf-env
 channels:
   - conda-forge
   - pytorch
+  - nvidia
 dependencies:
   - numpy
   - scipy
@@ -10,7 +11,8 @@ dependencies:
   - easydict
   - imageio
   - ipdb
-  - pytorch>=1.9.0
+  - pytorch
+  - pytorch-cuda=11.8
   - torchvision
   - tensorboard
   - visdom

and I removed "IMG_" from the file names.

I am training the model now.

Do you have an estimate of how long this might take on a RTX 3090.

What viewer can I use to make renders from the radiance field produced from this training run?

Example image below, EXIF information should be intact: 6063

Sam

samhodge-aiml commented 1 year ago

Options attached. options.zip

samhodge commented 1 year ago

After 2 hours it had not completed 10 iterations, what am I doing wrong?

chenhsuanlin commented 1 year ago

@samhodge @samhodge-aiml I'm not super confident whether BARF would work well on your data, as the viewpoint coverage is not as dense as what we had been experimenting before. My estimate of the runtime on a 3090 would be 8-10 hours, but I don't have one to benchmark with so I cannot say for sure (also it has been quite a while since I developed this project). The training shouldn't get stuck at 10 iterations though -- could you share the training log?

samhodge-aiml commented 1 year ago

that is the thing the GPU was loaded up (RTX 3090, 24 Gb, ~~98%~~ 39% GPU compute, ~~20%~~ <5% GPU memory)

But nothing really being logged at all.

I will try running it again and see if I can get something to share with you.

There was no error, no Tensorboard logs to speak of, but a file in the output directory, so write permission was OK, I turned off visdom

Let me give you everything I have so far and we can get to the bottom of it.

Thanks a million for the response.

samhodge-aiml commented 1 year ago

here is the stdout

python3 train.py --group=samh --model=barf --yaml=barf_iphone --name=bakerst006 --data.scene=bakerst --barf_c2f=[0.1,0.5] --visdom!
Process ID: 18377
[train.py] (PyTorch code for training NeRF/BARF)
setting configurations...
loading options/base.yaml...
loading options/nerf_llff.yaml...
loading options/barf_llff.yaml...
loading options/barf_iphone.yaml...
* H: 480
* W: 640
* arch:
   * density_activ: softplus
   * layers_feat: [None, 256, 256, 256, 256, 256, 256, 256, 256]
   * layers_rgb: [None, 128, 3]
   * posenc:
      * L_3D: 10
      * L_view: 4
   * skip: [4]
   * tf_init: True
* barf_c2f: [0.1, 0.5]
* batch_size: None
* camera:
   * model: perspective
   * ndc: False
   * noise: None
* cpu: False
* data:
   * augment:
   * center_crop: None
   * dataset: iphone
   * image_size: [480, 640]
   * num_workers: 4
   * preload: True
   * root: None
   * scene: bakerst
   * train_sub: None
   * val_on_test: False
   * val_ratio: 0.1
   * val_sub: None
* device: cuda:0
* freq:
   * ckpt: 5000
   * scalar: 200
   * val: 2000
   * vis: 1000
* gpu: 0
* group: samh
* load: None
* loss_weight:
   * render: 0
   * render_fine: None
* max_epoch: None
* max_iter: 10
* model: barf
* name: bakerst006
* nerf:
   * density_noise_reg: None
   * depth:
      * param: inverse
      * range: [1, 0]
   * fine_sampling: False
   * rand_rays: 2048
   * sample_intvs: 128
   * sample_intvs_fine: None
   * sample_stratified: True
   * setbg_opaque: None
   * view_dep: True
* optim:
   * algo: Adam
   * lr: 0.001
   * lr_end: 0.0001
   * lr_pose: 0.003
   * lr_pose_end: 1e-05
   * sched:
      * gamma: None
      * type: ExponentialLR
   * sched_pose:
      * gamma: None
      * type: ExponentialLR
   * test_iter: 100
   * test_photo: True
   * warmup_pose: None
* output_path: output/samh/bakerst006
* output_root: output
* resume: False
* seed: 0
* tb:
   * num_images: [4, 8]
* visdom: False
* yaml: barf_iphone
(creating new options file...)
Setting up [LPIPS] perceptual loss: trunk [alex], v[0.1], spatial [off]
/media/sam/aimlwork/github/bundle-adjusting-NeRF/env/lib/python3.11/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
/media/sam/aimlwork/github/bundle-adjusting-NeRF/env/lib/python3.11/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=AlexNet_Weights.IMAGENET1K_V1`. You can also use `weights=AlexNet_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Loading model from: /media/sam/aimlwork/github/bundle-adjusting-NeRF/env/lib/python3.11/site-packages/lpips/weights/v0.1/alex.pth
loading training data...
number of samples: 75                                                                                                            
loading test data...
number of samples: 8                                                                                                             
building networks...
setting up optimizers...
initializing weights from scratch...
setting up visualizers...
TRAINING START
validating:   0%|                                                                                          | 0/8 [00:00<?, ?it/s]/media/sam/aimlwork/github/bundle-adjusting-NeRF/env/lib/python3.11/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /home/conda/feedstock_root/build_artifacts/pytorch-recipe_1680557665316/work/aten/src/ATen/native/TensorShape.cpp:3483.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]

Sitting at this point

nvidia-smi

Wed Aug 30 18:37:26 2023       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.86.05              Driver Version: 535.86.05    CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090        Off | 00000000:06:00.0  On |                  N/A |
| 66%   68C    P2             243W / 350W |   1047MiB / 24576MiB |     39%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce RTX 3090        Off | 00000000:0A:00.0 Off |                  N/A |
| 32%   43C    P8              23W / 350W |     15MiB / 24576MiB |      1%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      2400      G   /usr/lib/xorg/Xorg                          134MiB |
|    0   N/A  N/A      4361      G   /usr/bin/gnome-shell                         95MiB |
|    0   N/A  N/A     14240      G   ...sion,SpareRendererForSitePerProcess       53MiB |
|    0   N/A  N/A     18377      C   python3                                     666MiB |
|    0   N/A  N/A     18794      G   ...4151621,13186568319809438527,262144       77MiB |
|    1   N/A  N/A      2400      G   /usr/lib/xorg/Xorg                            4MiB |
+---------------------------------------------------------------------------------------+

samhodge-aiml commented 1 year ago

One hour later no progress, I will leave it running overnight and see if anything happens

samhodge-aiml commented 1 year ago

It has been running for over 10 hours now and now progress, I am going to save the electricity.

chenhsuanlin commented 1 year ago

This shouldn't happen. Could you help pinpoint which line it hangs at?

samhodge commented 1 year ago

I can certainly keyboard interrupt the job and give you the stack trace

samhodge-aiml commented 1 year ago

Traceback (most recent call last):                                              
  File "/media/sam/aimlwork/github/bundle-adjusting-NeRF/train.py", line 32, in <module>
    main()
  File "/media/sam/aimlwork/github/bundle-adjusting-NeRF/train.py", line 29, in main
    m.train(opt)
  File "/media/sam/aimlwork/github/bundle-adjusting-NeRF/model/nerf.py", line 54, in train
    if self.iter_start==0: self.validate(opt,0)
                           ^^^^^^^^^^^^^^^^^^^^
  File "/media/sam/aimlwork/github/bundle-adjusting-NeRF/env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/media/sam/aimlwork/github/bundle-adjusting-NeRF/model/barf.py", line 66, in validate
    super().validate(opt,ep=ep)
  File "/media/sam/aimlwork/github/bundle-adjusting-NeRF/env/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/media/sam/aimlwork/github/bundle-adjusting-NeRF/model/base.py", line 152, in validate
    var = self.graph.forward(opt,var,mode="val")
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/sam/aimlwork/github/bundle-adjusting-NeRF/model/nerf.py", line 210, in forward
    ret = self.render_by_slices(opt,pose,intr=var.intr,mode=mode) if opt.nerf.rand_rays else \
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/sam/aimlwork/github/bundle-adjusting-NeRF/model/nerf.py", line 267, in render_by_slices
    ret = self.render(opt,pose,intr=intr,ray_idx=ray_idx,mode=mode) # [B,R,3],[B,R,1]
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/sam/aimlwork/github/bundle-adjusting-NeRF/model/nerf.py", line 236, in render
    center,ray = camera.get_center_and_ray(opt,pose,intr=intr) # [B,HW,3]
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/sam/aimlwork/github/bundle-adjusting-NeRF/camera.py", line 241, in get_center_and_ray
    grid_3D = cam2world(grid_3D,pose) # [B,HW,3]
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/media/sam/aimlwork/github/bundle-adjusting-NeRF/camera.py", line 213, in cam2world
    return X_hom@pose_inv.transpose(-1,-2)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

samhodge-aiml commented 1 year ago

Could it be that the focal length for the camera is causing an unsolvable matrix?

samhodge-aiml commented 1 year ago

samhodge commented 1 year ago

Might try here tomorrow https://camp-nerf.github.io/

chenhsuanlin commented 1 year ago

Yes, it is likely stuck in the loop as in #76. If you use batch size 1 the issue will likely go away -- I have not been able to figure out exactly where the bug was. CamP should be a quite decent improvement over BARF in joint camera optimization. I would definitely encourage you to try it out if they have the code released.

samhodge-aiml commented 1 year ago

no code yet, batch size of one it is

samhodge-aiml commented 1 year ago

Batch size of one didn't seem to work for me either.

SwirtaB commented 1 year ago

Hi,

while I was working with this codebase I have faced similar issue (training stuck in endless loop). It has turned out that during sampling along the ray, there was exponential (kind of) grow in depth for last few samples with the last ones as big as few thousends (or even 10000 on one occasion). It caused gradients to explode during backward propagation and some of parameters became NaN's, hence calculeted rays got NaN values in them. I wasn't able to pinpoint specific error in implementation. Bare in mind that I was experimenting on heavily modified architecture so I encurage you to check for abnormal values, details in doc.

There are number of strategies to deal with this problem (assuming that gradient explosion is what causing it), the simplest is to clip abnormal samples, which is very fast workaround. This can affect the results, but erroneous samples make up a very small proportion of the total training data, so it shouldn't be too bad.

samhodge commented 1 year ago

Thanks a million maybe tomorrow I can eek out a little time to see if I can make this into a PR

The information is very generous but I am not sure if my skills are ready right now to debug and patch the issue, but why die wondering right? I will see what I can do

chenhsuanlin commented 1 year ago

@SwirtaB thanks for the feedback! I hadn't been able to deterministically reproduce this issue, and did not realize it had to do with the sampled coordinates. In this case, this line is likely the culprit, where the depth of the last sample is set to a very large number (1e10). @samhodge if you find that tweaking the code to lower it to e.g. 1e3 would help, please let me know and I'm happy to make a hotfix.

samhodge commented 1 year ago

Yeah I can certainly write a smoothstep function to roll it off to a limit.

https://en.wikipedia.org/wiki/Smoothstep

samhodge-aiml commented 1 year ago

Trying this

diff --git a/model/nerf.py b/model/nerf.py
index b0dcb2c..eefef60 100644
--- a/model/nerf.py
+++ b/model/nerf.py
@@ -393,7 +393,7 @@ class NeRF(torch.nn.Module):
         ray_length = ray.norm(dim=-1,keepdim=True) # [B,HW,1]
         # volume rendering: compute probability (using quadrature)
         depth_intv_samples = depth_samples[...,1:,0]-depth_samples[...,:-1,0] # [B,HW,N-1]
-        depth_intv_samples = torch.cat([depth_intv_samples,torch.empty_like(depth_intv_samples[...,:1]).fill_(1e10)],dim=2) # [B,HW,N]
+        depth_intv_samples = torch.cat([depth_intv_samples,torch.empty_like(depth_intv_samples[...,:1]).fill_(1e3)],dim=2) # [B,HW,N]
         dist_samples = depth_intv_samples*ray_length # [B,HW,N]
         sigma_delta = density_samples*dist_samples # [B,HW,N]
         alpha = 1-(-sigma_delta).exp_() # [B,HW,N]

samhodge-aiml commented 1 year ago

I have another idea, that one didn't work:

https://numpy.org/doc/stable/reference/generated/numpy.heaviside.html

samhodge commented 1 year ago

Other things that do not work

diff --git a/data/iphone.py b/data/iphone.py
index 05cf1d5..e34bcc8 100644
--- a/data/iphone.py
+++ b/data/iphone.py
@@ -17,7 +17,7 @@ from util import log,debug
 class Dataset(base.Dataset):

     def __init__(self,opt,split="train",subset=None):
-        self.raw_H,self.raw_W = 1080,1920
+        self.raw_H,self.raw_W = 3024,4032
         super().__init__(opt,split)
         self.root = opt.data.root or "data/iphone"
         self.path = "{}/{}".format(self.root,opt.data.scene)
@@ -62,7 +62,7 @@ class Dataset(base.Dataset):
         return image

     def get_camera(self,opt,idx):
-        self.focal = self.raw_W*4.2/(12.8/2.55)
+        self.focal = self.raw_W*1.6*35.0
         intr = torch.tensor([[self.focal,0,self.raw_W/2],
                              [0,self.focal,self.raw_H/2],
                              [0,0,1]]).float()
diff --git a/model/nerf.py b/model/nerf.py
index b0dcb2c..9a02e77 100644
--- a/model/nerf.py
+++ b/model/nerf.py
@@ -391,9 +391,11 @@ class NeRF(torch.nn.Module):

     def composite(self,opt,ray,rgb_samples,density_samples,depth_samples):
         ray_length = ray.norm(dim=-1,keepdim=True) # [B,HW,1]
+        ray_length = numpy.clip(ray_length, 0, 1e3)
+        
         # volume rendering: compute probability (using quadrature)
         depth_intv_samples = depth_samples[...,1:,0]-depth_samples[...,:-1,0] # [B,HW,N-1]
-        depth_intv_samples = torch.cat([depth_intv_samples,torch.empty_like(depth_intv_samples[...,:1]).fill_(1e10)],dim=2) # [B,HW,N]
+        depth_intv_samples = torch.cat([depth_intv_samples,torch.empty_like(depth_intv_samples[...,:1]).fill_(1e3)],dim=2) # [B,HW,N]
         dist_samples = depth_intv_samples*ray_length # [B,HW,N]
         sigma_delta = density_samples*dist_samples # [B,HW,N]
         alpha = 1-(-sigma_delta).exp_() # [B,HW,N]
diff --git a/options/barf_iphone.yaml b/options/barf_iphone.yaml
index f344c7b..d58794b 100644
--- a/options/barf_iphone.yaml
+++ b/options/barf_iphone.yaml
@@ -2,5 +2,7 @@ _parent_: options/barf_llff.yaml

 data:                                                       # data options
     dataset: iphone                                         # dataset name
-    scene: IMG_0239                                         # scene name
+    scene: bakerst                                         # scene name
     image_size: [480,640]                                   # input image sizes [height,width]
+max_iter: 10
+batch_size: 1
diff --git a/requirements.yaml b/requirements.yaml
index 0baf8b0..2865db4 100644
--- a/requirements.yaml
+++ b/requirements.yaml
@@ -2,6 +2,7 @@ name: barf-env
 channels:
   - conda-forge
   - pytorch
+  - nvidia
 dependencies:
   - numpy
   - scipy
@@ -10,7 +11,8 @@ dependencies:
   - easydict
   - imageio
   - ipdb
-  - pytorch>=1.9.0
+  - pytorch
+  - pytorch-cuda=11.8
   - torchvision
   - tensorboard
   - visdom

SwirtaB commented 1 year ago

@chenhsuanlin no problem. I have gave your suggestion a try and it only delayed the problem for me, training have hang much later. Then I have cross checked your implementation of composit with NeRF article and their official implementation. By my understanding whole equation 3 from article reduces to alpha composition. In their implementation, they calculate it slightly different (original impl), so I gave it a try. I have commented T calculation and calculate prob as:

prob = (alpha * torch.cumprod(1.0 - alpha + 1e-10, dim=2))[..., None]

Unfortunately that didn't solve the problem, only delayed it again. That being said any workaround that ensures proper samples values (either by clipping or something else) works quite well. Maybe that is proper solution, since NeRF's are still neural networks and improper inputs could leads to all sorts of problems.

EDIT: T calculations and those from original implementation are identical (in the math sense) and differ only in numerical approach, it haven't noticed it at the beginning.

chenhsuanlin / bundle-adjusting-NeRF

Cameras with multiple overlapping regions: Will it work? #81