Open samhodge-aiml opened 1 year ago
Made a change to 14 rather than 13 and if customized_focal:
become if customized_focal or True:
Looks like this is a red hot tip: https://github.com/t-bence/exif-stats/blob/master/focal_stats.py#L44
Maybe all I needed was patience.
Does this seem correct?
@bianwenjing
I am worried that my modification with the width and height
diff --git a/dataloading/dataset.py b/dataloading/dataset.py
index d40af73..846273d 100644
--- a/dataloading/dataset.py
+++ b/dataloading/dataset.py
@@ -82,11 +82,16 @@ class DataField(object):
_, _, h, w = imgs.shape
if customized_focal:
- focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+ #focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+ FX_ = 13/35.0
+ CX_ = 4032
+ CY_ = 3024
+ FY_= FX_*(CY_/CX_)
+ focal_gt = [[FX_, 0, CX_], [0, FY_, CY_], [0, 0, 1]]
if resize_factor is None:
resize_factor = 1
- fx = focal_gt[0, 0] / resize_factor
- fy = focal_gt[1, 1] / resize_factor
+ fx = focal_gt[0][0] / resize_factor
+ fy = focal_gt[1][1] / resize_factor
else:
if load_colmap_poses:
fx, fy = focal, focal
Is in error, also I am wondering if CX_
should be 4032//2
and CY_
should be 3024//2
Also wondering if
diff --git a/configs/Test/images.yaml b/configs/Test/images.yaml
index 81a5824..4435fb2 100644
--- a/configs/Test/images.yaml
+++ b/configs/Test/images.yaml
@@ -12,4 +12,5 @@ training:
auto_scheduler: True
eval_pose_every: -1
extract_images:
- resolution: [540, 960]
\ No newline at end of file
+ resolution: [3024, 4032]
Messes up the internals of the convolutions, can I go for large scale resolution?
I am also wondering if I could speed up training the batch size of 1
seems to be under utilising resources, my 24 Gb RTX 3090 is only using a fraction of the VRAM and a fraction of the GPU Utilization.
grep -rn batch configs/
configs/default.yaml:14: batchsize: 1
configs/default.yaml:78: batch_size: 1
Yeah nah, didn't work trying again with different intrinsics values
diff --git a/configs/Test/images.yaml b/configs/Test/images.yaml
index 81a5824..264c4cf 100644
--- a/configs/Test/images.yaml
+++ b/configs/Test/images.yaml
@@ -12,4 +12,5 @@ training:
auto_scheduler: True
eval_pose_every: -1
extract_images:
- resolution: [540, 960]
\ No newline at end of file
+ resolution: [765, 1008]
+with_depth: False
diff --git a/configs/default.yaml b/configs/default.yaml
index adb9cb0..92aae7b 100644
--- a/configs/default.yaml
+++ b/configs/default.yaml
@@ -75,7 +75,7 @@ training:
load_distortion_dir: model_distortion.pt
n_training_points: 1024
scheduling_epoch: 10000
- batch_size: 1
+ batch_size: 8
learning_rate: 0.001
focal_lr: 0.001
pose_lr: 0.0005
diff --git a/configs/preprocess.yaml b/configs/preprocess.yaml
index c56b1fd..d3ec72c 100644
--- a/configs/preprocess.yaml
+++ b/configs/preprocess.yaml
@@ -1,9 +1,9 @@
depth:
type: DPT
dataloading:
- path: data/nerf_llff_data
- scene: ['fern']
+ path: data/Test
+ scene: ['images']
resize_factor:
load_colmap_poses: False
training:
- mode: 'all'
\ No newline at end of file
+ mode: 'all'
diff --git a/dataloading/dataset.py b/dataloading/dataset.py
index d40af73..717ce8d 100644
--- a/dataloading/dataset.py
+++ b/dataloading/dataset.py
@@ -81,12 +81,17 @@ class DataField(object):
imgs = np.transpose(imgs, (0, 3, 1, 2))
_, _, h, w = imgs.shape
- if customized_focal:
- focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+ if customized_focal or True:
+ #focal_gt = np.load(os.path.join(load_dir, 'intrinsics.npz'))['K'].astype(np.float32)
+ FX_ = 14.0/35.0
+ CX_ = 35.0/2.0
+ CY_ = (2032/3024) * CX_
+ FY_= FX_*(CY_/CX_)
+ focal_gt = [[FX_, 0, CX_], [0, FY_, CY_], [0, 0, 1]]
if resize_factor is None:
resize_factor = 1
- fx = focal_gt[0, 0] / resize_factor
- fy = focal_gt[1, 1] / resize_factor
+ fx = focal_gt[0][0] / resize_factor
+ fy = focal_gt[1][1] / resize_factor
else:
if load_colmap_poses:
fx, fy = focal, focal
diff --git a/environment.yaml b/environment.yaml
index dfde749..a81a313 100644
--- a/environment.yaml
+++ b/environment.yaml
@@ -4,12 +4,13 @@ channels:
- conda-forge
- anaconda
- defaults
+ - nvidia
dependencies:
- - python=3.9
- - pytorch=1.7
- - torchvision=0.8.2
- - torchaudio
- - cudatoolkit=10.1
+ - python
+ - pytorch=2.0.0
+ - torchvision=0.15.0
+ - torchaudio=2.0.0
+ - pytorch-cuda=11.8
- cffi
- cython
- imageio
@@ -39,4 +40,4 @@ dependencies:
- lpips
- setuptools
- kornia==0.5.0
- - imageio-ffmpeg
\ No newline at end of file
+ - imageio-ffmpeg
This is with training from COLMAP and hallucinated depth maps
https://github.com/ActiveVisionLab/nope-nerf/assets/102564797/2b64c14b-40b8-4109-af7e-f1d6c770d4c3
What are the limitations on the input dataset?
Hi, sorry for my late reply. The input images I used are consecutive and closely sampled from a video. This is essential because the point cloud loss requires a dense matching between two views. I noticed that the images you provided are rather sparse, which might make the point cloud loss less effective.
Yeah they are from photos rather than a video. I can try shooting the same location from another photographic approach with much more dense sampling. Thanks for your response
Here are my modifications to the source code
And my dataset
https://drive.google.com/drive/folders/1ZZgZUrFrnP47rx8bN5K6yvYnSC50a-9G?usp=sharing
When what I have done to start training is put the images in
data/Test/images/images
then run the preprocess and train commands
and I have found the tensorboard attached here:
log.zip
Is this OK?
or did I muck up the intrinsics?
attached in a JPG to look at the EXIF information
I think it may be 14 rather than 13 I will try again.