Open LownyCGI opened 1 year ago
Looks like they built a new data parser that isn't compatible with the ns-process-data. https://github.com/autonomousvision/sdfstudio/blob/master/docs/sdfstudio-data.md#Customize-your-own-dataset
You'd probably need to write a conversion script similar to what they provide to use something like the already implemented colmap/polycam methods with ns-process-data
Speaking of, I would love to see something like this https://github.com/autonomousvision/sdfstudio/blob/master/scripts/datasets/process_scannet_to_sdfstudio.py
For the great tools nerfstudio already provides via the colmap/record3d/polycam interfaces. It looks like things just need to be converted to the meta_data.json
file which is really similar to the transformation.json
file provided by nerfstudio
@LownyCGI @pablovela5620 Currently the foreground mask is only used in the heritage dataset. We plan to add mask support (including masks for pixels sampling used in training and foreground masks used in the loss computation) to sdfstudio data sparser soon.
All other data formats supported in nerfstudio could be used in sdfstudio naturally but it's not been tested yet. For example, to use blender dataset, you could run something like:
ns-train neus-facto --pipeline.model.near-plane 2.0 --pipeline.model.far-plane 6.0 --pipeline.model.overwrite-near-far-plane True blender-data --data data/blender/lego/
Note you may also need to change some default configs to make it work though.
@niujinshuchong So I've been trying to get things working with the output from the colmap tool that nerfstudio provides ns-process-data images etc...
and I've been facing the following issues.
First I couldn't get things working from outright using the outputs from ns-process-data
It seems like there's a difference in the coordinate systems that DTU (and all the other sdfstudio datasets) and the original nerfstudio datasets (Which uses Blender/OpenGL).
Here's what I mean. When I train on scan65 of the DTU dataset with the command shown in the readme and view things on the web viewer it looks like this
Here on the other hand is what I get when I use the output from ns-process-data images
in the original Nerfstudio repo
(its got a bunch of floaters but the output cameras seem to be oriented correctly)
Because I want to use geometric priors for supervision I wrote this script (based on the scannet-to-sdfstudio.py script) and it I got results that don't really makes sense. Here's the script
import argparse
import glob
import json
import os
from pathlib import Path
import cv2
import numpy as np
import PIL
from PIL import Image
from torchvision import transforms
def main():
parser = argparse.ArgumentParser(description="preprocess scannet dataset to sdfstudio dataset")
parser.add_argument("--data", dest="input_path", help="path to scannet scene")
parser.set_defaults(im_name="NONE")
parser.add_argument("--output-dir", dest="output_path", help="path to output")
parser.set_defaults(store_name="NONE")
args = parser.parse_args()
output_path = Path(args.output_path) # "data/custom/scannet_scene0050_00"
input_path = Path(args.input_path) # "/home/yuzh/Projects/datasets/scannet/scene0050_00"
output_path.mkdir(parents=True, exist_ok=True)
# load transformation json with images/intrinsics/extrinsics
camera_parameters_path = input_path / "transforms.json"
camera_parameters = json.load(open(camera_parameters_path))
# extract intrinsic parameters
cx = camera_parameters["cx"]
cy = camera_parameters["cy"]
fl_x = camera_parameters["fl_x"]
fl_y = camera_parameters["fl_y"]
camera_intrinsic = np.array([[fl_x, 0, cx], [0, fl_y, cy], [0, 0, 1]])
# load poses
poses = []
image_paths = []
# only load images with corresponding pose info
# currently in random order??, probably need to sort
for camera in camera_parameters["frames"]:
# OpenGL/Blender convention, needs to change to COLMAP/OpenCV convention
# https://docs.nerf.studio/en/latest/quickstart/data_conventions.html
## IGNORED for now
c2w = np.array(camera["transform_matrix"]).reshape(4, 4)
c2w[0:3, 1:3] *= -1
img_path = input_path / camera["file_path"]
assert img_path.exists()
image_paths.append(img_path)
poses.append(c2w)
poses = np.array(poses)
# deal with invalid poses
valid_poses = np.isfinite(poses).all(axis=2).all(axis=1)
min_vertices = poses[:, :3, 3][valid_poses].min(axis=0)
max_vertices = poses[:, :3, 3][valid_poses].max(axis=0)
center = (min_vertices + max_vertices) / 2.0
scale = 2.0 / (np.max(max_vertices - min_vertices) + 3.0)
# we should normalize pose to unit cube
poses[:, :3, 3] -= center
poses[:, :3, 3] *= scale
# inverse normalization
scale_mat = np.eye(4).astype(np.float32)
scale_mat[:3, 3] -= center
scale_mat[:3] *= scale
scale_mat = np.linalg.inv(scale_mat)
# copy image
sample_img = cv2.imread(str(image_paths[0]))
H, W, _ = sample_img.shape # 1080 x 1920
# get smallest side to generate square crop
target_crop = min(H, W)
target_size = 384
trans_totensor = transforms.Compose(
[
transforms.CenterCrop(target_crop),
transforms.Resize(target_size, interpolation=PIL.Image.BILINEAR),
]
)
# center crop by min_dim
offset_x = (W - target_crop) * 0.5
offset_y = (H - target_crop) * 0.5
camera_intrinsic[0, 2] -= offset_x
camera_intrinsic[1, 2] -= offset_y
# resize from min_dim x min_dim -> to 384 x 384
resize_factor = target_size / target_crop
camera_intrinsic[:2, :] *= resize_factor
K = camera_intrinsic
frames = []
out_index = 0
for idx, (valid, pose, image_path) in enumerate(zip(valid_poses, poses, image_paths)):
if not valid:
continue
target_image = output_path / f"{out_index:06d}_rgb.png"
img = Image.open(image_path)
img_tensor = trans_totensor(img)
img_tensor.save(target_image)
rgb_path = str(target_image.relative_to(output_path))
frame = {
"rgb_path": rgb_path,
"camtoworld": pose.tolist(),
"intrinsics": K.tolist(),
"mono_depth_path": rgb_path.replace("_rgb.png", "_depth.npy"),
"mono_normal_path": rgb_path.replace("_rgb.png", "_normal.npy"),
}
frames.append(frame)
out_index += 1
# scene bbox for the scannet scene
scene_box = {
"aabb": [[-1, -1, -1], [1, 1, 1]],
"near": 0.05,
"far": 2.5,
"radius": 1.0,
"collider_type": "box",
}
# meta data
output_data = {
"camera_model": "OPENCV",
"height": target_size,
"width": target_size,
"has_mono_prior": True,
"pairs": None,
"worldtogt": scale_mat.tolist(),
"scene_box": scene_box,
}
output_data["frames"] = frames
# save as json
with open(output_path / "meta_data.json", "w", encoding="utf-8") as f:
json.dump(output_data, f, indent=4)
if __name__ == "__main__":
main()
Lastly here it is after processing the colmap output from nerfstudio through the above script. I can provide the dataset/outputs if that would be helpful. Let me know, this repo is extremely helpful but one of the things that made nerfstudio so great is being able to use custom videos/images. I hope to contribute this conversion script such that it can be done with SDFStudio as well
Heres the render showing the colmap -> sdf version output from the above script
@pablovela5620 Thanks for sharing your results.
For your first video, you can add --auto-orient True
at the end of your training command to make the up vector the same as the viewer.
How do you get your second video? Which method and config do you use for the second video?
Yes, sharing your output from colmap and nerfstudio dataset converted from colmap would be very helpful. I think maybe the normalization is not suited here because what you used is what we heuristically chose for scannet dataset where all cameras are normalized to inside the room.
For the second video, I used the output from ns-process-data images --data data/sdfstudio-demo-data/dtu-scan65/image/
with the original nerfstudio repo ns-train nerfacto
. I made no modifications at all to the outputs from colmap. (I tried running it on sdfstudio but would get the following error)
File "/home/pablo/miniconda3/envs/sdfstudio/bin/ns-train", line 8, in <module>
sys.exit(entrypoint())
File "/home/pablo/0Dev/repos/sdfstudio/scripts/train.py", line 248, in entrypoint
main(
File "/home/pablo/0Dev/repos/sdfstudio/scripts/train.py", line 234, in main
launch(
File "/home/pablo/0Dev/repos/sdfstudio/scripts/train.py", line 173, in launch
main_func(local_rank=0, world_size=world_size, config=config)
File "/home/pablo/0Dev/repos/sdfstudio/scripts/train.py", line 87, in train_loop
trainer.setup()
File "/home/pablo/0Dev/repos/sdfstudio/nerfstudio/engine/trainer.py", line 115, in setup
self.pipeline = self.config.pipeline.setup(
File "/home/pablo/0Dev/repos/sdfstudio/nerfstudio/configs/base_config.py", line 66, in setup
return self._target(self, **kwargs)
File "/home/pablo/0Dev/repos/sdfstudio/nerfstudio/pipelines/base_pipeline.py", line 224, in __init__
self.datamanager: VanillaDataManager = config.datamanager.setup(
File "/home/pablo/0Dev/repos/sdfstudio/nerfstudio/configs/base_config.py", line 66, in setup
return self._target(self, **kwargs)
File "/home/pablo/0Dev/repos/sdfstudio/nerfstudio/data/datamanagers/base_datamanager.py", line 321, in __init__
self.train_dataset = self.create_train_dataset()
File "/home/pablo/0Dev/repos/sdfstudio/nerfstudio/data/datamanagers/base_datamanager.py", line 328, in create_train_dataset
dataparser_outputs=self.dataparser.get_dataparser_outputs(split="train"),
File "/home/pablo/0Dev/repos/sdfstudio/nerfstudio/data/dataparsers/base_dataparser.py", line 127, in get_dataparser_outputs
dataparser_outputs = self._generate_dataparser_outputs(split)
File "/home/pablo/0Dev/repos/sdfstudio/nerfstudio/data/dataparsers/nerfstudio_dataparser.py", line 193, in _generate_dataparser_outputs
scale_factor /= torch.max(torch.abs(poses[:, :3, 3]))
TypeError: tuple indices must be integers or slices, not tuple
Heres a link to the dataset I used. https://drive.google.com/drive/folders/1tFvZQXohO4oXj1y8uJGdjqbIVZkF4zeK?usp=sharing
I just used the included colmap install to generate the poses for the provided demo DTU dataset.
running
ns-process-data images --data data/sdfstudio-demo-data/dtu-scan65/image/ --output-dir $CHOSEN-DIR
scan65 is the output from this and scan65-processed is the output from parsing this data using the python script I shared above. Thanks for the help on this!
One last thing that I noticed is that https://github.com/autonomousvision/sdfstudio/blob/e52c37fe0840f1f3d339c79132dfd2adbd11b6f8/nerfstudio/data/dataparsers/sdfstudio_dataparser.py#L189
assumes that the input data has not already been preprocessed to be in the data format of Nerfstudio (blender/opengl), where as when using ns-process-data
this is already done which would cause some issues (I tried changing it back but that didn't seem to fix the issue)
-- EDIT --
So I managed to figure out what the above error is, for some reason this line is missing the transformation
output and its being merged together into a list. You can see how its there in the original nerfstudio repo. I'm guessing it has to do with the modified DataManager? Anyway, this makes it so running ns-train nerfacto nerfstudio-data --data $COLMAP-DATA
runs just fine, but switching to ns-train neus-facto --pipeline.model.sdf-field.inside-outside False sdfstudio-data --data $COLMAP-DATA
produces garbage results. This feels like there's just something wrong with the format of the poses or how their read. More info or consistency between the different datasets/managers/parsers would be appreciated
@pablovela5620 Thanks for sharing the data.
After normalization with
center = (min_vertices + max_vertices) / 2.0
scale = 2.0 / (np.max(max_vertices - min_vertices) + 3.0)
The object is not centered at the origin. And in this case, only a very small part of the unit cube are occupied by the object. I think the reason is that multi-res grids are sensitive to initialization in this case because we initialize the SDF as a sphere centered at the origin and the feature grids can overfit to color very quickly even the geometry is bad.
I tried with with pure MLP
--pipeline.model.sdf-field.inside-outside False --pipeline.model.sdf-field.use-grid-feature False
and it can get reasonable results.
This was super helpful! I'll make a PR for this processing script then, I do have a question which is, shouldn't the normalization center the object? or is this an issue with COLMAP, or with the way DTU was collected? I'll do some more testing with some of the provided datasets nerfstudio includes to see if I can get them working with the hashgrid. Thanks again for the help and clearing this up. I tested with pure MLP and it seemed to work!
I'm noticing the normalization sets the cameras as the origin
and because its only from a front-facing viewpoint (and the object is far away) it leads to the state you're describing. I'm guessing a scene where we get the object in the front side and back would not cause this problem
I did have one last question, I was under the assumption that hashgrid encoding was the fastest version, but now trying with pure MLP its around 10ms. Is there something I'm missing here or was it a faulty assumption?
@pablovela5620 Hi, ideally we want to center the object but the camera center is not aligned with the object center in the DTU dataset. Maybe a better way is computing the scene center with the sparse point cloud from colmap.
Yes, hashgrid converges faster but it still needs a MLP to predict SDF and color. In the pure MLP case here, it is faster per-iteration because we skip the grid features but it converges much slower and we need much more iterations.
Okay so I'm still having trouble getting my own data to work. Here's what I'm trying and the results I'm getting. I've included the processed data for nerfstudio (not the one processed by the above script as its large and taking too long to upload) to see if I could get some help.
https://drive.google.com/drive/folders/1HPm0U2wL9PtWovlV_pFe-P9_WJWnbLTT?usp=sharing
The weird bit is that the processed that works just fine in the original MonoSDF repo
I created a dataset using the polycam app and converted it to the nerfstudio-data format (very similar to the output from colmap). I did so doing the following command
ns-process-data polycam --data $PATH_TO_ZIP --output-dir $PATH_TO_OUTPUT
This generates the data which I use to train
Here's the script to generate the data needed for the original monosdf repo looks like (its almost exactly the same as the one I posted https://github.com/autonomousvision/sdfstudio/issues/2#issuecomment-1358541961 except instead of a .json its a camera.npz file)
import numpy as np
import cv2
import os
import json
import argparse
import PIL
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt
from pathlib import Path
POLYCAM = True
target_crop = 720
target_size = 384
trans_totensor = transforms.Compose([
transforms.CenterCrop(target_crop),
transforms.Resize(target_size, interpolation=PIL.Image.BILINEAR),
])
def argument_parser():
parser = argparse.ArgumentParser(description='Visualize output for depth or surface normals')
parser.add_argument('--data', help="path to processed colmap data from nerfstudio")
parser.set_defaults(im_name='NONE')
parser.add_argument('--output-dir', help="path to where output image should be stored")
parser.set_defaults(store_name='NONE')
args = parser.parse_args()
return args
def convert():
args = argument_parser()
data_root = Path(args.data)
out_path_prefix = Path(args.output_dir)
assert data_root.exists()
# load intrinsic/extrinsics
camera_parameters_path = data_root / "transforms.json"
camera_parameters = json.load(open(camera_parameters_path))
if not POLYCAM:
cx = camera_parameters['cx']
cy = camera_parameters['cy']
fl_x = camera_parameters['fl_x']
fl_y = camera_parameters['fl_y']
else:
cx = 0
cy = 0
fl_x = 0
fl_y = 0
camera_parameters = camera_parameters['frames']
num_frames = len(camera_parameters)
poses = []
image_paths = []
for camera in camera_parameters:
if POLYCAM:
# average frames into single intrinsic
cx += camera['cx']
cy += camera['cy']
fl_x += camera['fl_x']
fl_y += camera['fl_y']
# OpenGL/Blender convention, needs to change to COLMAP/OpenCV convention
# https://docs.nerf.studio/en/latest/quickstart/data_conventions.html
## IGNORED for now
c2w = np.array(camera["transform_matrix"]).reshape(4, 4)
c2w[0:3, 1:3] *= -1
img_path = data_root / camera["file_path"]
assert img_path.exists()
image_paths.append(img_path)
poses.append(c2w)
if POLYCAM:
# intrinsics
cx /= num_frames
cy /= num_frames
fl_x /= num_frames
fl_y /= num_frames
camera_intrinsic = np.array(
[[fl_x, 0, cx],
[0, fl_y, cy],
[0, 0, 1]]
)
poses = np.stack(poses)
# deal with invalid poses
valid_poses = np.isfinite(poses).all(axis=2).all(axis=1)
min_vertices = poses[:, :3, 3][valid_poses].min(axis=0)
max_vertices = poses[:, :3, 3][valid_poses].max(axis=0)
center = (min_vertices + max_vertices) / 2.
scale = 2. / (np.max(max_vertices - min_vertices) + 3.)
# we should normalized to unit cube
scale_mat = np.eye(4).astype(np.float32)
scale_mat[:3, 3] = -center
scale_mat[:3 ] *= scale
scale_mat = np.linalg.inv(scale_mat)
# copy image
out_index = 0
cameras = {}
pcds = []
# H, W = 738, 994
H, W, _ = cv2.imread(str(image_paths[0])).shape
# center crop by 738
offset_x = (W - target_crop) * 0.5
offset_y = (H - target_crop) * 0.5
camera_intrinsic[0, 2] -= offset_x
camera_intrinsic[1, 2] -= offset_y
# resize, from 738x738 -> 384x384
resize_factor = target_size / target_crop
camera_intrinsic[:2, :] *= resize_factor
K = np.eye(4)
K[:3, :3] = camera_intrinsic
for idx, (valid, pose, image_path) in enumerate(zip(valid_poses, poses, image_paths)):
print(idx, valid)
if not valid : continue
target_image = out_path_prefix / "image" / f"{out_index:06d}.png"
target_image.parent.mkdir(parents=True, exist_ok=True)
img = Image.open(image_path)
img_tensor = trans_totensor(img)
img_tensor.save(target_image)
# masks aren't used
mask = (np.ones((target_size, target_size, 3)) * 255.).astype(np.uint8)
target_image = str(out_path_prefix / "mask" / f"{out_index:03d}.png")
cv2.imwrite(target_image, mask)
# save pose
pcds.append(pose[:3, 3])
pose = K @ np.linalg.inv(pose)
cameras["scale_mat_%d"%(out_index)] = scale_mat
cameras["world_mat_%d"%(out_index)] = pose
out_index += 1
np.savez(str(out_path_prefix / "cameras.npz"), **cameras)
os.system(f'python preprocess/extract_monocular_cues.py --img_path {out_path_prefix / "image"} --output_path {out_path_prefix} --task normal')
os.system(f'python preprocess/extract_monocular_cues.py --img_path {out_path_prefix / "image"} --output_path {out_path_prefix} --task depth')
if __name__ == "__main__":
convert()
Given this, I get fairly good reconstruction results as you can see here
Here is the output from nerfacto after training on the polycam data (and you can see the cameras are all fit within the bounding box)
Here is the output from using the example command for monosdf with the processed data
ns-train monosdf --pipeline.model.sdf-field.use-grid-feature True --pipeline.model.sdf-field.hidden-dim 256 --pipeline.model.sdf-field.num-layers 2 --pipeline.model.sdf-field.num-layers-color 2 --pipeline.model.sdf-field.use-appearance-embedding True --pipeline.model.sdf-field.geometric-init True --pipeline.model.sdf-field.inside-outside True --pipeline.model.sdf-field.bias 0.8 --pipeline.model.sdf-field.beta-init 0.1 --pipeline.datamanager.train-num-images-to-sample-from 1 --pipeline.datamanager.train-num-times-to-repeat-images 0 --trainer.steps-per-eval-image 5000 --pipeline.model.background-model none --vis wandb --experiment-name polycam-conference-room --pipeline.model.mono-depth-loss-mult 0.001 --pipeline.model.mono-normal-loss-mult 0.01 --pipeline.datamanager.train-num-rays-per-batch 2048 --machine.num-gpus 1 sdfstudio-data --data $PATH_TO_DATA --include_mono_prior True --skip_every_for_val_split 30
here's what it looks like in the viewer, and here's the outputs from using wandb. As you can see there's something wrong.
You can also see here in the viewer that the scene still fits in the bounding box (but seems to have shrunken relative to what the ns-process-data provides. I'm not sure if that's because of what the auto_orient_and_center_poses
function transform provides
I don't know if there's some bug, or something I'm missing.
@pablovela5620 Hi, thanks for sharing your data and results.
I ran your data with the same command you provided and got something like this (only 5K iterations). It looks like it's not completely failed, as shown in your screenshot (200K in your case). Or does it look good at the beginning and fail in the end?
The training loss is:
The script I modified to generate the data is
import argparse
import glob
import json
import os
from pathlib import Path
import cv2
import numpy as np
import PIL
from PIL import Image
from torchvision import transforms
POLYCAM = True
def main():
parser = argparse.ArgumentParser(description="preprocess scannet dataset to sdfstudio dataset")
parser.add_argument("--data", dest="input_path", help="path to scannet scene")
parser.set_defaults(im_name="NONE")
parser.add_argument("--output-dir", dest="output_path", help="path to output")
parser.set_defaults(store_name="NONE")
args = parser.parse_args()
output_path = Path(args.output_path) # "data/custom/scannet_scene0050_00"
input_path = Path(args.input_path) # "/home/yuzh/Projects/datasets/scannet/scene0050_00"
output_path.mkdir(parents=True, exist_ok=True)
# load transformation json with images/intrinsics/extrinsics
camera_parameters_path = input_path / "transforms.json"
camera_parameters = json.load(open(camera_parameters_path))
# extract intrinsic parameters
if not POLYCAM:
cx = camera_parameters["cx"]
cy = camera_parameters["cy"]
fl_x = camera_parameters["fl_x"]
fl_y = camera_parameters["fl_y"]
else:
cx = 0
cy = 0
fl_x = 0
fl_y = 0
camera_parameters = camera_parameters["frames"]
num_frames = len(camera_parameters)
camera_intrinsic = np.array([[fl_x, 0, cx], [0, fl_y, cy], [0, 0, 1]])
# load poses
poses = []
image_paths = []
# only load images with corresponding pose info
# currently in random order??, probably need to sort
for camera in camera_parameters:
if POLYCAM:
# average frames into single intrinsic
cx += camera["cx"]
cy += camera["cy"]
fl_x += camera["fl_x"]
fl_y += camera["fl_y"]
# OpenGL/Blender convention, needs to change to COLMAP/OpenCV convention
# https://docs.nerf.studio/en/latest/quickstart/data_conventions.html
## IGNORED for now
c2w = np.array(camera["transform_matrix"]).reshape(4, 4)
c2w[0:3, 1:3] *= -1
img_path = input_path / camera["file_path"]
assert img_path.exists()
image_paths.append(img_path)
poses.append(c2w)
poses = np.array(poses)
if POLYCAM:
# intrinsics
cx /= num_frames
cy /= num_frames
fl_x /= num_frames
fl_y /= num_frames
camera_intrinsic = np.array([[fl_x, 0, cx], [0, fl_y, cy], [0, 0, 1]])
# deal with invalid poses
valid_poses = np.isfinite(poses).all(axis=2).all(axis=1)
min_vertices = poses[:, :3, 3][valid_poses].min(axis=0)
max_vertices = poses[:, :3, 3][valid_poses].max(axis=0)
center = (min_vertices + max_vertices) / 2.0
scale = 2.0 / (np.max(max_vertices - min_vertices) + 3.0)
# we should normalize pose to unit cube
poses[:, :3, 3] -= center
poses[:, :3, 3] *= scale
# inverse normalization
scale_mat = np.eye(4).astype(np.float32)
scale_mat[:3, 3] -= center
scale_mat[:3] *= scale
scale_mat = np.linalg.inv(scale_mat)
# copy image
sample_img = cv2.imread(str(image_paths[0]))
H, W, _ = sample_img.shape # 1080 x 1920
# get smallest side to generate square crop
target_crop = min(H, W)
target_size = 384
trans_totensor = transforms.Compose(
[
transforms.CenterCrop(target_crop),
transforms.Resize(target_size, interpolation=PIL.Image.BILINEAR),
]
)
# center crop by min_dim
offset_x = (W - target_crop) * 0.5
offset_y = (H - target_crop) * 0.5
camera_intrinsic[0, 2] -= offset_x
camera_intrinsic[1, 2] -= offset_y
# resize from min_dim x min_dim -> to 384 x 384
resize_factor = target_size / target_crop
camera_intrinsic[:2, :] *= resize_factor
K = camera_intrinsic
frames = []
out_index = 0
for idx, (valid, pose, image_path) in enumerate(zip(valid_poses, poses, image_paths)):
if not valid:
continue
target_image = output_path / f"{out_index:06d}_rgb.png"
img = Image.open(image_path)
img_tensor = trans_totensor(img)
img_tensor.save(target_image)
rgb_path = str(target_image.relative_to(output_path))
frame = {
"rgb_path": rgb_path,
"camtoworld": pose.tolist(),
"intrinsics": K.tolist(),
"mono_depth_path": rgb_path.replace("_rgb.png", "_depth.npy"),
"mono_normal_path": rgb_path.replace("_rgb.png", "_normal.npy"),
}
frames.append(frame)
out_index += 1
# scene bbox for the scannet scene
scene_box = {
"aabb": [[-1, -1, -1], [1, 1, 1]],
"near": 0.05,
"far": 2.5,
"radius": 1.0,
"collider_type": "box",
}
# meta data
output_data = {
"camera_model": "OPENCV",
"height": target_size,
"width": target_size,
"has_mono_prior": True,
"pairs": None,
"worldtogt": scale_mat.tolist(),
"scene_box": scene_box,
}
output_data["frames"] = frames
# save as json
with open(output_path / "meta_data.json", "w", encoding="utf-8") as f:
json.dump(output_data, f, indent=4)
if __name__ == "__main__":
main()
Hmmm this is what it looks like for me at 5k iterations
I'll take a look at the modifications you made to see if its something there. I'll try running it again.
@pablovela5620 This is what I got with large weighs for monocular prior
--pipeline.model.mono-depth-loss-mult 0.1 --pipeline.model.mono-normal-loss-mult 0.05
I have no idea what changed, I must have set a parameter wrong at some point but now it seems to be working for me with this custom data!
I will add that your results look much more clean then mine do at iteration 30k and I'm not fully sure why
I'm guessing is the larger prior losses?
Would you mind posting your config.yml
file from the output? here's mine
!!python/object:nerfstudio.configs.base_config.Config
data: null
experiment_name: polycam-conference-room
logging: !!python/object:nerfstudio.configs.base_config.LoggingConfig
enable_profiler: true
local_writer: !!python/object:nerfstudio.configs.base_config.LocalWriterConfig
_target: !!python/name:nerfstudio.utils.writer.LocalWriter ''
enable: true
max_log_size: 10
stats_to_track: !!python/tuple
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Train Iter (time)
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Train Rays / Sec
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Test PSNR
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Vis Rays / Sec
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Test Rays / Sec
max_buffer_size: 20
relative_log_dir: !!python/object/apply:pathlib.PosixPath []
steps_per_log: 10
machine: !!python/object:nerfstudio.configs.base_config.MachineConfig
dist_url: auto
machine_rank: 0
num_gpus: 1
num_machines: 1
seed: 42
method_name: monosdf
optimizers:
field_background:
optimizer: !!python/object:nerfstudio.engine.optimizers.AdamOptimizerConfig
_target: &id001 !!python/name:torch.optim.adam.Adam ''
eps: 1.0e-15
lr: 0.0005
weight_decay: 0
scheduler: !!python/object:nerfstudio.engine.schedulers.ExponentialSchedulerConfig
_target: &id002 !!python/name:torch.optim.lr_scheduler.ExponentialLR ''
decay_rate: 0.1
max_steps: 200000
fields:
optimizer: !!python/object:nerfstudio.engine.optimizers.AdamOptimizerConfig
_target: *id001
eps: 1.0e-15
lr: 0.0005
weight_decay: 0
scheduler: !!python/object:nerfstudio.engine.schedulers.ExponentialSchedulerConfig
_target: *id002
decay_rate: 0.1
max_steps: 200000
output_dir: !!python/object/apply:pathlib.PosixPath
- outputs
pipeline: !!python/object:nerfstudio.pipelines.base_pipeline.VanillaPipelineConfig
_target: !!python/name:nerfstudio.pipelines.base_pipeline.VanillaPipeline ''
datamanager: !!python/object:nerfstudio.data.datamanagers.base_datamanager.VanillaDataManagerConfig
_target: !!python/name:nerfstudio.data.datamanagers.base_datamanager.VanillaDataManager ''
camera_optimizer: !!python/object:nerfstudio.cameras.camera_optimizers.CameraOptimizerConfig
_target: !!python/name:nerfstudio.cameras.camera_optimizers.CameraOptimizer ''
mode: 'off'
optimizer: !!python/object:nerfstudio.engine.optimizers.AdamOptimizerConfig
_target: *id001
eps: 1.0e-08
lr: 0.0006
weight_decay: 0.01
orientation_noise_std: 0.0
param_group: camera_opt
position_noise_std: 0.0
scheduler: !!python/object:nerfstudio.engine.schedulers.SchedulerConfig
_target: !!python/name:nerfstudio.engine.schedulers.ExponentialDecaySchedule ''
lr_final: 5.0e-06
max_steps: 10000
camera_res_scale_factor: 1.0
dataparser: !!python/object:nerfstudio.data.dataparsers.sdfstudio_dataparser.SDFStudioDataParserConfig
_target: !!python/name:nerfstudio.data.dataparsers.sdfstudio_dataparser.SDFStudio ''
auto_orient: false
data: !!python/object/apply:pathlib.PosixPath
- data
- polycam
- conference_room3-processed-new
downscale_factor: 1
include_mono_prior: true
load_pairs: false
neighbors_num: null
neighbors_shuffle: false
pairs_sorted_ascending: true
scene_scale: 2.0
skip_every_for_val_split: 30
eval_image_indices: !!python/tuple
- 0
eval_num_images_to_sample_from: -1
eval_num_rays_per_batch: 1024
eval_num_times_to_repeat_images: -1
train_num_images_to_sample_from: 1
train_num_rays_per_batch: 2048
train_num_times_to_repeat_images: 0
model: !!python/object:nerfstudio.models.volsdf.VolSDFModelConfig
_target: !!python/name:nerfstudio.models.volsdf.VolSDFModel ''
background_color: black
background_model: none
collider_params:
far_plane: 6.0
near_plane: 2.0
eikonal_loss_mult: 0.1
enable_collider: true
eval_num_rays_per_chunk: 1024
far_plane: 4.0
far_plane_bg: 1000.0
fg_mask_loss_mult: 0.01
loss_coefficients:
rgb_loss_coarse: 1.0
rgb_loss_fine: 1.0
min_patch_variance: 0.01
mono_depth_loss_mult: 0.001
mono_normal_loss_mult: 0.01
near_plane: 0.05
num_samples: 64
num_samples_eval: 128
num_samples_extra: 32
num_samples_outside: 32
overwrite_near_far_plane: false
patch_size: 11
patch_warp_angle_thres: 0.3
patch_warp_loss_mult: 0.0
periodic_tvl_mult: 0.0
sdf_field: !!python/object:nerfstudio.fields.sdf_field.SDFFieldConfig
_target: !!python/name:nerfstudio.fields.sdf_field.SDFField ''
appearance_embedding_dim: 32
beta_init: 0.1
bias: 0.8
divide_factor: 2.0
encoding_type: hash
geo_feat_dim: 256
geometric_init: true
hidden_dim: 256
hidden_dim_color: 256
inside_outside: true
num_layers: 2
num_layers_color: 2
use_appearance_embedding: true
use_grid_feature: true
weight_norm: true
topk: 4
use_average_appearance_embedding: false
timestamp: 2022-12-22_130757
trainer: !!python/object:nerfstudio.configs.base_config.TrainerConfig
load_config: null
load_dir: null
load_scheduler: true
load_step: null
max_num_iterations: 200000
mixed_precision: false
relative_model_dir: !!python/object/apply:pathlib.PosixPath
- sdfstudio_models
save_only_latest_checkpoint: true
steps_per_eval_all_images: 1000000
steps_per_eval_batch: 5000
steps_per_eval_image: 5000
steps_per_save: 20000
viewer: !!python/object:nerfstudio.configs.base_config.ViewerConfig
ip_address: 127.0.0.1
launch_bridge_server: true
max_num_display_images: 512
num_rays_per_chunk: 32768
quit_on_train_completion: false
relative_log_filename: viewer_log_filename.txt
skip_openrelay: false
start_train: true
websocket_port: 7007
zmq_port: null
vis: wandb
@pablovela5620 I think it's because of the large weights of monocular cues. Here is the config
!!python/object:nerfstudio.configs.base_config.Config
data: null
experiment_name: polycam-conference-room
logging: !!python/object:nerfstudio.configs.base_config.LoggingConfig
enable_profiler: true
local_writer: !!python/object:nerfstudio.configs.base_config.LocalWriterConfig
_target: !!python/name:nerfstudio.utils.writer.LocalWriter ''
enable: true
max_log_size: 10
stats_to_track: !!python/tuple
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Train Iter (time)
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Train Rays / Sec
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Test PSNR
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Vis Rays / Sec
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Test Rays / Sec
max_buffer_size: 20
relative_log_dir: !!python/object/apply:pathlib.PosixPath []
steps_per_log: 10
machine: !!python/object:nerfstudio.configs.base_config.MachineConfig
dist_url: auto
machine_rank: 0
num_gpus: 1
num_machines: 1
seed: 42
method_name: monosdf
optimizers:
field_background:
optimizer: !!python/object:nerfstudio.engine.optimizers.AdamOptimizerConfig
_target: &id001 !!python/name:torch.optim.adam.Adam ''
eps: 1.0e-15
lr: 0.0005
weight_decay: 0
scheduler: !!python/object:nerfstudio.engine.schedulers.ExponentialSchedulerConfig
_target: &id002 !!python/name:torch.optim.lr_scheduler.ExponentialLR ''
decay_rate: 0.1
max_steps: 200000
fields:
optimizer: !!python/object:nerfstudio.engine.optimizers.AdamOptimizerConfig
_target: *id001
eps: 1.0e-15
lr: 0.0005
weight_decay: 0
scheduler: !!python/object:nerfstudio.engine.schedulers.ExponentialSchedulerConfig
_target: *id002
decay_rate: 0.1
max_steps: 200000
output_dir: !!python/object/apply:pathlib.PosixPath
- outputs
pipeline: !!python/object:nerfstudio.pipelines.base_pipeline.VanillaPipelineConfig
_target: !!python/name:nerfstudio.pipelines.base_pipeline.VanillaPipeline ''
datamanager: !!python/object:nerfstudio.data.datamanagers.base_datamanager.VanillaDataManagerConfig
_target: !!python/name:nerfstudio.data.datamanagers.base_datamanager.VanillaDataManager ''
camera_optimizer: !!python/object:nerfstudio.cameras.camera_optimizers.CameraOptimizerConfig
_target: !!python/name:nerfstudio.cameras.camera_optimizers.CameraOptimizer ''
mode: 'off'
optimizer: !!python/object:nerfstudio.engine.optimizers.AdamOptimizerConfig
_target: *id001
eps: 1.0e-08
lr: 0.0006
weight_decay: 0.01
orientation_noise_std: 0.0
param_group: camera_opt
position_noise_std: 0.0
scheduler: !!python/object:nerfstudio.engine.schedulers.SchedulerConfig
_target: !!python/name:nerfstudio.engine.schedulers.ExponentialDecaySchedule ''
lr_final: 5.0e-06
max_steps: 10000
camera_res_scale_factor: 1.0
dataparser: !!python/object:nerfstudio.data.dataparsers.sdfstudio_dataparser.SDFStudioDataParserConfig
_target: !!python/name:nerfstudio.data.dataparsers.sdfstudio_dataparser.SDFStudio ''
auto_orient: false
data: !!python/object/apply:pathlib.PosixPath
- data
- test-colmap
downscale_factor: 1
include_mono_prior: true
load_pairs: false
neighbors_num: null
neighbors_shuffle: false
pairs_sorted_ascending: true
scene_scale: 2.0
skip_every_for_val_split: 30
eval_image_indices: !!python/tuple
- 0
eval_num_images_to_sample_from: -1
eval_num_rays_per_batch: 1024
eval_num_times_to_repeat_images: -1
train_num_images_to_sample_from: 1
train_num_rays_per_batch: 2048
train_num_times_to_repeat_images: 0
model: !!python/object:nerfstudio.models.volsdf.VolSDFModelConfig
_target: !!python/name:nerfstudio.models.volsdf.VolSDFModel ''
background_color: black
background_model: none
collider_params:
far_plane: 6.0
near_plane: 2.0
eikonal_loss_mult: 0.1
enable_collider: true
eval_num_rays_per_chunk: 1024
far_plane: 4.0
far_plane_bg: 1000.0
fg_mask_loss_mult: 0.01
loss_coefficients:
rgb_loss_coarse: 1.0
rgb_loss_fine: 1.0
min_patch_variance: 0.01
mono_depth_loss_mult: 0.1
mono_normal_loss_mult: 0.05
near_plane: 0.05
num_samples: 64
num_samples_eval: 128
num_samples_extra: 32
num_samples_outside: 32
overwrite_near_far_plane: false
patch_size: 11
patch_warp_angle_thres: 0.3
patch_warp_loss_mult: 0.0
periodic_tvl_mult: 0.0
sdf_field: !!python/object:nerfstudio.fields.sdf_field.SDFFieldConfig
_target: !!python/name:nerfstudio.fields.sdf_field.SDFField ''
appearance_embedding_dim: 32
beta_init: 0.1
bias: 0.8
divide_factor: 2.0
encoding_type: hash
geo_feat_dim: 256
geometric_init: true
hidden_dim: 256
hidden_dim_color: 256
inside_outside: true
num_layers: 2
num_layers_color: 2
use_appearance_embedding: true
use_grid_feature: true
weight_norm: true
topk: 4
use_average_appearance_embedding: false
timestamp: 2022-12-22_185647
trainer: !!python/object:nerfstudio.configs.base_config.TrainerConfig
load_config: null
load_dir: null
load_scheduler: true
load_step: null
max_num_iterations: 200000
mixed_precision: false
relative_model_dir: !!python/object/apply:pathlib.PosixPath
- sdfstudio_models
save_only_latest_checkpoint: true
steps_per_eval_all_images: 1000000
steps_per_eval_batch: 5000
steps_per_eval_image: 5000
steps_per_save: 20000
viewer: !!python/object:nerfstudio.configs.base_config.ViewerConfig
ip_address: 127.0.0.1
launch_bridge_server: true
max_num_display_images: 512
num_rays_per_chunk: 32768
quit_on_train_completion: false
relative_log_filename: viewer_log_filename.txt
skip_openrelay: false
start_train: true
websocket_port: 7007
zmq_port: null
vis: wandb
@niujinshuchong, hello , I use process_nerfstudio_to_sdfstudio.py to transform from nerfstudio data to sdfstudio data, and train it with mono-neus model, but I have some problems.
2.viewer.websocket.port Black screen
@pyramidpoint Do you think the camera poses are correct in the viewer? The normalization in process_nerfstudio_to_sdfstudio.py is used for indoor scenes by default, maybe you need to adapt it to your dataset.
Thanks for the great work. I have been trying to get scripts/datasets/process_nerfstudio_to_sdfstudio.py
to work with data processed with Nerfstudio and Colmap. The script keeps trying to find the depth_path
which in this case does not exist. Any idea what am I missing here?
python scripts/datasets/process_nerfstudio_to_sdfstudio.py --data ~/sdfdata/hand/hand-processed/colmap --output-dir ~/sdfdata/hand/hand-processed/colmap-sdf --type colmap --geo-type mono_prior --omnidata_path /home/kasm-user/omnidata/omnidata_tools/torch --pretrained_models /home/kasm-user/omnidata/omnidata_tools/pretrained_models
Traceback (most recent call last):
File "scripts/datasets/process_nerfstudio_to_sdfstudio.py", line 265, in <module>
main()
File "scripts/datasets/process_nerfstudio_to_sdfstudio.py", line 190, in main
depth_path = depth_paths[idx]
IndexError: list index out of range
@raw5 Thanks for reporting this. Currently the script assume depth maps always exists which is not always the case. @pablovela5620 Are there any chance that you could fix it? Thanks.
Yep can do that, sorry about that! I only tested with Polycam and forgot to check Colmap
Hi,
I try to run it with nerfstudio's demo data, poster.
First, I download the data and convert it to sdfstudio format using scripts/datasets/process_nerfstudio_to_sdfstudio.py --type colmap
. Then, train a NeuS-facto model using ns-train neus-facto --pipeline.model.sdf-field.inside-outside False --experiment-name neus-facto-poster sdfstudio-data --data data/sdf_poster
command. After training done, I try to extract meshes, but get the following error:
Traceback (most recent call last): File "/home/hpc/iwfa/iwfa008h/miniconda3/envs/sdfstudio/bin/ns-extract-mesh", line 8, in
sys.exit(entrypoint()) File "/home/woody/iwfa/iwfa008h/ref-sdf/sdfstudio/scripts/extract_mesh.py", line 79, in entrypoint tyro.cli(tyro.conf.FlagConversionOff[ExtractMesh]).main() File "/home/woody/iwfa/iwfa008h/ref-sdf/sdfstudio/scripts/extract_mesh.py", line 65, in main get_surface_sliding( File "/home/hpc/iwfa/iwfa008h/miniconda3/envs/sdfstudio/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context return func(*args, **kwargs) File "/home/woody/iwfa/iwfa008h/ref-sdf/sdfstudio/nerfstudio/utils/marching_cubes.py", line 159, in get_surface_sliding combined.export(filename) AttributeError: 'list' object has no attribute 'export'
Here is the config:
!!python/object:nerfstudio.configs.base_config.Config
data: null
experiment_name: neus-facto-poster
logging: !!python/object:nerfstudio.configs.base_config.LoggingConfig
enable_profiler: true
local_writer: !!python/object:nerfstudio.configs.base_config.LocalWriterConfig
_target: !!python/name:nerfstudio.utils.writer.LocalWriter ''
enable: true
max_log_size: 10
stats_to_track: !!python/tuple
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Train Iter (time)
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Train Rays / Sec
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Test PSNR
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Vis Rays / Sec
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Test Rays / Sec
max_buffer_size: 20
relative_log_dir: !!python/object/apply:pathlib.PosixPath []
steps_per_log: 10
machine: !!python/object:nerfstudio.configs.base_config.MachineConfig
dist_url: auto
machine_rank: 0
num_gpus: 1
num_machines: 1
seed: 42
method_name: neus-facto
optimizers:
field_background:
optimizer: !!python/object:nerfstudio.engine.optimizers.AdamOptimizerConfig
_target: &id001 !!python/name:torch.optim.adam.Adam ''
eps: 1.0e-15
lr: 0.0005
weight_decay: 0
scheduler: !!python/object:nerfstudio.engine.schedulers.NeuSSchedulerConfig
_target: &id002 !!python/name:nerfstudio.engine.schedulers.NeuSScheduler ''
learning_rate_alpha: 0.05
max_steps: 20000
warm_up_end: 500
fields:
optimizer: !!python/object:nerfstudio.engine.optimizers.AdamOptimizerConfig
_target: *id001
eps: 1.0e-15
lr: 0.0005
weight_decay: 0
scheduler: !!python/object:nerfstudio.engine.schedulers.NeuSSchedulerConfig
_target: *id002
learning_rate_alpha: 0.05
max_steps: 20000
warm_up_end: 500
proposal_networks:
optimizer: !!python/object:nerfstudio.engine.optimizers.AdamOptimizerConfig
_target: *id001
eps: 1.0e-15
lr: 0.01
weight_decay: 0
scheduler: !!python/object:nerfstudio.engine.schedulers.MultiStepSchedulerConfig
_target: !!python/name:torch.optim.lr_scheduler.MultiStepLR ''
max_steps: 20000
output_dir: !!python/object/apply:pathlib.PosixPath
- outputs
pipeline: !!python/object:nerfstudio.pipelines.base_pipeline.VanillaPipelineConfig
_target: !!python/name:nerfstudio.pipelines.base_pipeline.VanillaPipeline ''
datamanager: !!python/object:nerfstudio.data.datamanagers.base_datamanager.VanillaDataManagerConfig
_target: !!python/name:nerfstudio.data.datamanagers.base_datamanager.VanillaDataManager ''
camera_optimizer: !!python/object:nerfstudio.cameras.camera_optimizers.CameraOptimizerConfig
_target: !!python/name:nerfstudio.cameras.camera_optimizers.CameraOptimizer ''
mode: 'off'
optimizer: !!python/object:nerfstudio.engine.optimizers.AdamOptimizerConfig
_target: *id001
eps: 1.0e-08
lr: 0.0006
weight_decay: 0.01
orientation_noise_std: 0.0
param_group: camera_opt
position_noise_std: 0.0
scheduler: !!python/object:nerfstudio.engine.schedulers.SchedulerConfig
_target: !!python/name:nerfstudio.engine.schedulers.ExponentialDecaySchedule ''
lr_final: 5.0e-06
max_steps: 10000
camera_res_scale_factor: 1.0
dataparser: !!python/object:nerfstudio.data.dataparsers.sdfstudio_dataparser.SDFStudioDataParserConfig
_target: !!python/name:nerfstudio.data.dataparsers.sdfstudio_dataparser.SDFStudio ''
auto_orient: false
data: !!python/object/apply:pathlib.PosixPath
- data
- sdf_poster
downscale_factor: 1
include_foreground_mask: false
include_mono_prior: false
include_sensor_depth: false
include_sfm_points: false
load_pairs: false
neighbors_num: null
neighbors_shuffle: false
pairs_sorted_ascending: true
scene_scale: 2.0
skip_every_for_val_split: 1
eval_image_indices: !!python/tuple
- 0
eval_num_images_to_sample_from: -1
eval_num_rays_per_batch: 1024
eval_num_times_to_repeat_images: -1
train_num_images_to_sample_from: -1
train_num_rays_per_batch: 2048
train_num_times_to_repeat_images: -1
model: !!python/object:nerfstudio.models.neus_facto.NeuSFactoModelConfig
_target: !!python/name:nerfstudio.models.neus_facto.NeuSFactoModel ''
background_color: black
background_model: none
base_variance: 64
collider_params:
far_plane: 6.0
near_plane: 2.0
eikonal_loss_mult: 0.1
enable_collider: true
eval_num_rays_per_chunk: 1024
far_plane: 4.0
far_plane_bg: 1000.0
fg_mask_loss_mult: 0.01
interlevel_loss_mult: 1.0
loss_coefficients:
rgb_loss_coarse: 1.0
rgb_loss_fine: 1.0
min_patch_variance: 0.01
mono_depth_loss_mult: 0.0
mono_normal_loss_mult: 0.0
near_plane: 0.05
num_neus_samples_per_ray: 48
num_proposal_iterations: 2
num_proposal_samples_per_ray: !!python/tuple
- 256
- 96
num_samples: 64
num_samples_importance: 64
num_samples_outside: 32
num_up_sample_steps: 4
overwrite_near_far_plane: false
patch_size: 11
patch_warp_angle_thres: 0.3
patch_warp_loss_mult: 0.0
periodic_tvl_mult: 0.0
perturb: true
proposal_net_args_list:
- hidden_dim: 16
log2_hashmap_size: 17
max_res: 64
num_levels: 5
- hidden_dim: 16
log2_hashmap_size: 17
max_res: 256
num_levels: 5
proposal_update_every: 5
proposal_warmup: 5000
proposal_weights_anneal_max_num_iters: 1000
proposal_weights_anneal_slope: 10.0
sdf_field: !!python/object:nerfstudio.fields.sdf_field.SDFFieldConfig
_target: !!python/name:nerfstudio.fields.sdf_field.SDFField ''
appearance_embedding_dim: 32
beta_init: 0.3
bias: 0.5
divide_factor: 2.0
encoding_type: hash
geo_feat_dim: 256
geometric_init: true
hidden_dim: 256
hidden_dim_color: 256
inside_outside: false
num_layers: 2
num_layers_color: 2
use_appearance_embedding: false
use_grid_feature: true
weight_norm: true
sensor_depth_freespace_loss_mult: 0.0
sensor_depth_l1_loss_mult: 0.0
sensor_depth_sdf_loss_mult: 0.0
sensor_depth_truncation: 0.015
sparse_points_sdf_loss_mult: 0.0
topk: 4
use_average_appearance_embedding: false
use_proposal_weight_anneal: true
use_same_proposal_network: false
use_single_jitter: true
timestamp: 2023-01-14_161555
trainer: !!python/object:nerfstudio.configs.base_config.TrainerConfig
load_config: null
load_dir: null
load_scheduler: true
load_step: null
max_num_iterations: 20001
mixed_precision: false
relative_model_dir: !!python/object/apply:pathlib.PosixPath
- sdfstudio_models
save_only_latest_checkpoint: true
steps_per_eval_all_images: 1000000
steps_per_eval_batch: 5000
steps_per_eval_image: 5000
steps_per_save: 20000
viewer: !!python/object:nerfstudio.configs.base_config.ViewerConfig
ip_address: 127.0.0.1
launch_bridge_server: true
max_num_display_images: 512
num_rays_per_chunk: 32768
quit_on_train_completion: false
relative_log_filename: viewer_log_filename.txt
skip_openrelay: false
start_train: true
websocket_port: 7007
zmq_port: null
vis: viewer
I did same steps for other custom datasets, but got same error. Do you have any idea about what goes wrong?
@refiksoyak Did you visualise the training process in viewer or wandb to first check whether it can get something reasonable? Or maybe there are some problem with trimmesh version.
@niujinshuchong I use trimesh==3.18.0
. Is that correct? I train in a server, so cannot use viewer, but enabled wandb, and I got the following error during evaluation:
Step (% Done) Train Iter (time) Train Rays / Sec
--------------------------------------------------------------
4900 (24.50%) 71.126 ms 28.87 K
4910 (24.55%) 71.409 ms 28.73 K
4920 (24.60%) 71.917 ms 28.52 K
4930 (24.65%) 71.231 ms 28.78 K
4940 (24.70%) 70.869 ms 28.95 K
4950 (24.75%) 70.156 ms 29.24 K
4960 (24.80%) 70.723 ms 29.01 K
4970 (24.85%) 71.441 ms 28.73 K
4980 (24.90%) 71.626 ms 28.66 K
4990 (24.95%) 71.599 ms 28.66 K
Printing profiling stats, from longest to shortest duration in seconds
Trainer.train_iteration: 0.0697
VanillaPipeline.get_train_loss_dict: 0.0340
VanillaPipeline.get_eval_loss_dict: 0.0251
Trainer.eval_iteration: 0.0000
Traceback (most recent call last):
File "/home/hpc/iwfa/iwfa008h/miniconda3/envs/sdfstudio/bin/ns-train", line 8, in <module>
sys.exit(entrypoint())
File "/home/woody/iwfa/iwfa008h/ref-sdf/sdfstudio/scripts/train.py", line 248, in entrypoint
main(
File "/home/woody/iwfa/iwfa008h/ref-sdf/sdfstudio/scripts/train.py", line 234, in main
launch(
File "/home/woody/iwfa/iwfa008h/ref-sdf/sdfstudio/scripts/train.py", line 173, in launch
main_func(local_rank=0, world_size=world_size, config=config)
File "/home/woody/iwfa/iwfa008h/ref-sdf/sdfstudio/scripts/train.py", line 88, in train_loop
trainer.train()
File "/home/woody/iwfa/iwfa008h/ref-sdf/sdfstudio/nerfstudio/engine/trainer.py", line 174, in train
self.eval_iteration(step)
File "/home/woody/iwfa/iwfa008h/ref-sdf/sdfstudio/nerfstudio/utils/decorators.py", line 70, in wrapper
ret = func(self, *args, **kwargs)
File "/home/woody/iwfa/iwfa008h/ref-sdf/sdfstudio/nerfstudio/utils/profiler.py", line 43, in wrapper
ret = func(*args, **kwargs)
File "/home/woody/iwfa/iwfa008h/ref-sdf/sdfstudio/nerfstudio/engine/trainer.py", line 347, in eval_iteration
metrics_dict, images_dict = self.pipeline.get_eval_image_metrics_and_images(step=step)
File "/home/woody/iwfa/iwfa008h/ref-sdf/sdfstudio/nerfstudio/utils/profiler.py", line 43, in wrapper
ret = func(*args, **kwargs)
File "/home/woody/iwfa/iwfa008h/ref-sdf/sdfstudio/nerfstudio/pipelines/base_pipeline.py", line 311, in get_eval_image_metrics_and_images
metrics_dict, images_dict = self.model.get_image_metrics_and_images(outputs, batch)
File "/home/woody/iwfa/iwfa008h/ref-sdf/sdfstudio/nerfstudio/models/neus_facto.py", line 202, in get_image_metrics_and_images
prop_depth_i = colormaps.apply_depth_colormap(
File "/home/woody/iwfa/iwfa008h/ref-sdf/sdfstudio/nerfstudio/utils/colormaps.py", line 74, in apply_depth_colormap
colored_image = apply_colormap(depth, cmap=cmap)
File "/home/woody/iwfa/iwfa008h/ref-sdf/sdfstudio/nerfstudio/utils/colormaps.py", line 42, in apply_colormap
assert image_long_min >= 0, f"the min value is {image_long_min}"
AssertionError: the min value is -9223372036854775808
@refiksoyak The training failed. I haven't tested this scene. Could you try --pipeline.model.sdf-field.inside-outside True
or try to use an MLP architecture?
@refiksoyak I quickly tested and found that the using monocular prior can get reasonable results. However, after pose normalization in the preprocessing script, large part of the scene are outside the unit bbox. In this case, background model should be used.
First, make sure you extract the mono prior with the script
python scripts/datasets/process_nerfstudio_to_sdfstudio.py --type colmap --data data/nerfstudio/posters_v3/ --output-dir data/sdfstudio/poster --indoor
And then train NeuS-facto with mono prior and background model
ns-train neus-facto --pipeline.model.sdf-field.inside-outside True --pipeline.model.sdf-field.bias 0.8 --pipeline.model.mono-depth-loss-mult 0.1 --pipeline.model.mono-normal-loss-mult 0.05 --pipeline.model.background-model mlp sdfstudio-data --data data/sdfstudio/poster --auto-orient True --include-mono-prior True
Then you will get something like this in the viewer
I think you could adapt the processing script for this scene to get better results. The viewer could be used remotely and you just need to forward the port with
ssh -L 7007:localhost:7007 <username>@<remote-machine-ip>
Hello, first of all, thank you for the great work. I was trying to get it to work with my own data but it seems that I'm not getting something because the results I get are weird, and not as good as with the DTU dataset.
My dataset and the meta json file are located here: https://drive.google.com/drive/folders/11HbD5ZxMkp9MrL-B49FswCUW3TIkTV1_?usp=sharing
I'm able to execute the whole pipeline:
python scripts/datasets/process_nerfstudio_to_sdfstudio.py --data sdf-datasets/coke-processed/ --data-type colmap --scene-type indoor --output-dir datasets/sdfstudio-dataset/coke-sdf-indoor
(I've tried different configurations of these script, the results are more or less the same)
ns-train neus-facto --pipeline.model.sdf-field.inside-outside False --experiment-name neus-facto-coke-object-inside-outside-false sdfstudio-data --data datasets/sdf-datasets/coke-sdf-object/
Could it be something related to the processing of the data before running the training process?
Do you use masks for the optimization process?
@Serge3006 Your images has complex background. Maybe you could enable background model and use foreground mask with --pipeline.model.background-model mlp --pipeline.model.fg-mask-loss-mult 1.0
. The other (and maybe better) way it to manually create training images with clean background
clean_images = images * mask + (1. - mask) # white background
Hello, first of all, thank you for the great work. I was trying to get it to work with my own data but it seems that I'm not getting something because the results I get are weird, and not as good as with the DTU dataset.
My dataset and the meta json file are located here: https://drive.google.com/drive/folders/11HbD5ZxMkp9MrL-B49FswCUW3TIkTV1_?usp=sharing
I'm able to execute the whole pipeline:
- Generate the nerf-studio format using the ns-process-data ... script.
- Transform the nerf-studio format to the sdf-studio one using the process_nerfstudio_to_sdftudio.py script. My data is composed only by images, I don't not compute neither normals not depths:
python scripts/datasets/process_nerfstudio_to_sdfstudio.py --data sdf-datasets/coke-processed/ --data-type colmap --scene-type indoor --output-dir datasets/sdfstudio-dataset/coke-sdf-indoor
(I've tried different configurations of these script, the results are more or less the same) 3. Train neus-facto with the following config:
ns-train neus-facto --pipeline.model.sdf-field.inside-outside False --experiment-name neus-facto-coke-object-inside-outside-false sdfstudio-data --data datasets/sdf-datasets/coke-sdf-object/
4. Generate the mesh: As you can see the can is there but it is surrounded by a lot of background and also the quality of the can object is not good.Could it be something related to the processing of the data before running the training process?
Do you use masks for the optimization process?
hi I used your data and directly run training code, but the rendered mesh has no coke yet, do you have any other revisions with your uploaded data?
hello, I trained my bakedSDF model on custom dataset with following command:
ns-train bakedsdf --vis wandb --experiment-name phoenix-bakedsdf --data data/nerfstudio-data-mipnerf360/phoenix --pipeline.datamanager.camera-res-scale-factor 0.25 --pipeline.datamanager.train-num-rays-per-batch 5000 --pipeline.model.sdf-field.inside-outside True --pipeline.model.scene-contraction-norm l2 mipnerf360-data
and some problem occur when I tried to extract mesh:
ns-extract-mesh --load-config outputs/phoenix-bakedsdf/bakedsdf/2023-03-15_214055/config.yml --output-path meshes/phoenix_mesh_1024.ply --output-path phoenix_1204.ply --bounding-box-min -2.0 -2.0 -2.0 --bounding-box-max 2.0 2.0 2.0 --resolution 1024 --marching_cube_threshold 0.001 --create_visibility_mask True
1 1 0 tensor(0.0007, device='cuda:0') torch.Size([134217728]) torch.Size([133750, 3]) torch.Size([134217728, 3]) 1 1 1 tensor(0.0011, device='cuda:0') torch.Size([134217728]) torch.Size([208082, 3]) torch.Size([134217728, 3])
Traceback (most recent call last): File "/home/undergrad/anaconda3/envs/sdfstudio/bin/ns-extract-mesh", line 8, in
sys.exit(entrypoint()) File "/home/undergrad/sdfstudio/scripts/extract_mesh.py", line 137, in entrypoint tyro.cli(tyro.conf.FlagConversionOff[ExtractMesh]).main() File "/home/undergrad/sdfstudio/scripts/extract_mesh.py", line 93, in main get_surface_sliding_with_contraction( File "/home/undergrad/sdfstudio/nerfstudio/utils/marching_cubes.py", line 324, in get_surface_sliding_with_contraction combined.vertices = inv_contraction(torch.from_numpy(combined.vertices)).numpy() AttributeError: 'list' object has no attribute 'vertices'
I have little idea on what cause the error, here's the config file
!!python/object:nerfstudio.configs.base_config.Config
data: &id002 !!python/object/apply:pathlib.PosixPath
- data
- nerfstudio-data-mipnerf360
- phoenix
experiment_name: phoenix-bakedsdf
logging: !!python/object:nerfstudio.configs.base_config.LoggingConfig
enable_profiler: true
local_writer: !!python/object:nerfstudio.configs.base_config.LocalWriterConfig
_target: !!python/name:nerfstudio.utils.writer.LocalWriter ''
enable: true
max_log_size: 10
stats_to_track: !!python/tuple
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Train Iter (time)
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Train Rays / Sec
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Test PSNR
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Vis Rays / Sec
- !!python/object/apply:nerfstudio.utils.writer.EventName
- Test Rays / Sec
max_buffer_size: 20
relative_log_dir: !!python/object/apply:pathlib.PosixPath []
steps_per_log: 10
machine: !!python/object:nerfstudio.configs.base_config.MachineConfig
dist_url: auto
machine_rank: 0
num_gpus: 1
num_machines: 1
seed: 42
method_name: bakedsdf
optimizers:
fields:
optimizer: !!python/object:nerfstudio.engine.optimizers.AdamOptimizerConfig
_target: &id001 !!python/name:torch.optim.adam.Adam ''
eps: 1.0e-15
lr: 0.01
weight_decay: 0
scheduler: !!python/object:nerfstudio.engine.schedulers.NeuSSchedulerConfig
_target: !!python/name:nerfstudio.engine.schedulers.NeuSScheduler ''
learning_rate_alpha: 0.05
max_steps: 250000
warm_up_end: 500
proposal_networks:
optimizer: !!python/object:nerfstudio.engine.optimizers.AdamOptimizerConfig
_target: *id001
eps: 1.0e-15
lr: 0.01
weight_decay: 0
scheduler: !!python/object:nerfstudio.engine.schedulers.MultiStepSchedulerConfig
_target: !!python/name:torch.optim.lr_scheduler.MultiStepLR ''
max_steps: 250000
output_dir: !!python/object/apply:pathlib.PosixPath
- outputs
pipeline: !!python/object:nerfstudio.pipelines.base_pipeline.VanillaPipelineConfig
_target: !!python/name:nerfstudio.pipelines.base_pipeline.VanillaPipeline ''
datamanager: !!python/object:nerfstudio.data.datamanagers.base_datamanager.VanillaDataManagerConfig
_target: !!python/name:nerfstudio.data.datamanagers.base_datamanager.VanillaDataManager ''
camera_optimizer: !!python/object:nerfstudio.cameras.camera_optimizers.CameraOptimizerConfig
_target: !!python/name:nerfstudio.cameras.camera_optimizers.CameraOptimizer ''
mode: 'off'
optimizer: !!python/object:nerfstudio.engine.optimizers.AdamOptimizerConfig
_target: *id001
eps: 1.0e-08
lr: 0.0006
weight_decay: 0.01
orientation_noise_std: 0.0
param_group: camera_opt
position_noise_std: 0.0
scheduler: !!python/object:nerfstudio.engine.schedulers.SchedulerConfig
_target: !!python/name:nerfstudio.engine.schedulers.ExponentialDecaySchedule ''
lr_final: 5.0e-06
max_steps: 10000
camera_res_scale_factor: 0.25
dataparser: !!python/object:nerfstudio.data.dataparsers.mipnerf360_dataparser.Mipnerf360DataParserConfig
_target: !!python/name:nerfstudio.data.dataparsers.mipnerf360_dataparser.Mipnerf360 ''
auto_scale_poses: true
center_poses: true
data: *id002
downscale_factor: null
eval_interval: 8
orientation_method: up
scale_factor: 1.0
scene_scale: 1.0
eval_image_indices: !!python/tuple
- 0
eval_num_images_to_sample_from: -1
eval_num_rays_per_batch: 1024
eval_num_times_to_repeat_images: -1
train_num_images_to_sample_from: -1
train_num_rays_per_batch: 5000
train_num_times_to_repeat_images: -1
model: !!python/object:nerfstudio.models.bakedsdf.BakedSDFModelConfig
_target: !!python/name:nerfstudio.models.bakedsdf.BakedSDFFactoModel ''
background_color: black
background_model: none
beta_anneal_max_num_iters: 250000
collider_params:
far_plane: 6.0
near_plane: 2.0
eikonal_anneal_max_num_iters: 250000
eikonal_loss_mult: 0.01
enable_collider: true
eval_num_rays_per_chunk: 1024
far_plane: 1000.0
far_plane_bg: 1000.0
fg_mask_loss_mult: 0.01
interlevel_loss_mult: 1.0
loss_coefficients:
rgb_loss_coarse: 1.0
rgb_loss_fine: 1.0
min_patch_variance: 0.01
mono_depth_loss_mult: 0.0
mono_normal_loss_mult: 0.0
near_plane: 0.2
num_neus_samples_per_ray: 48
num_proposal_iterations: 2
num_proposal_samples_per_ray: !!python/tuple
- 256
- 96
num_samples: 64
num_samples_eval: 128
num_samples_extra: 32
num_samples_outside: 32
overwrite_near_far_plane: true
patch_size: 11
patch_warp_angle_thres: 0.3
patch_warp_loss_mult: 0.0
periodic_tvl_mult: 0.0
proposal_net_args_list:
- hidden_dim: 16
log2_hashmap_size: 17
max_res: 64
num_levels: 5
- hidden_dim: 16
log2_hashmap_size: 17
max_res: 256
num_levels: 5
proposal_update_every: 5
proposal_warmup: 5000
proposal_weights_anneal_max_num_iters: 1000
proposal_weights_anneal_slope: 10.0
scene_contraction_norm: l2
sdf_field: !!python/object:nerfstudio.fields.sdf_field.SDFFieldConfig
_target: !!python/name:nerfstudio.fields.sdf_field.SDFField ''
appearance_embedding_dim: 32
beta_init: 0.1
bias: 0.05
divide_factor: 2.0
encoding_type: hash
geo_feat_dim: 256
geometric_init: true
hidden_dim: 256
hidden_dim_color: 256
inside_outside: true
num_layers: 2
num_layers_color: 2
off_axis: true
position_encoding_max_degree: 8
rgb_padding: 0.001
use_appearance_embedding: false
use_diffuse_color: true
use_grid_feature: true
use_n_dot_v: true
use_reflections: true
use_specular_tint: true
weight_norm: true
sensor_depth_freespace_loss_mult: 0.0
sensor_depth_l1_loss_mult: 0.0
sensor_depth_sdf_loss_mult: 0.0
sensor_depth_truncation: 0.015
sparse_points_sdf_loss_mult: 0.0
topk: 4
use_anneal_beta: true
use_anneal_eikonal_weight: false
use_average_appearance_embedding: false
use_proposal_weight_anneal: true
use_same_proposal_network: false
use_single_jitter: true
timestamp: 2023-03-15_214055
trainer: !!python/object:nerfstudio.configs.base_config.TrainerConfig
load_config: null
load_dir: null
load_scheduler: true
load_step: null
max_num_iterations: 250001
mixed_precision: false
relative_model_dir: !!python/object/apply:pathlib.PosixPath
- sdfstudio_models
save_only_latest_checkpoint: true
steps_per_eval_all_images: 1000000
steps_per_eval_batch: 5000
steps_per_eval_image: 5000
steps_per_save: 20000
viewer: !!python/object:nerfstudio.configs.base_config.ViewerConfig
ip_address: 127.0.0.1
launch_bridge_server: true
max_num_display_images: 512
num_rays_per_chunk: 32768
quit_on_train_completion: false
relative_log_filename: viewer_log_filename.txt
skip_openrelay: false
start_train: true
websocket_port: 7007
zmq_port: null
vis: wandb
@ChicoChen Not sure if it's related to trimesh version. I am using 3.15.8. Did you try to visualise the training progress?
@niujinshuchong thanks for the reply, didn't notice that I'm using trimesh 3.20.1 and here are my eval images well, this is obviously not a good result
@ChicoChen Your reconstruction seems fail completely. You should check if your camera poses are correct.
Hello, thanks for the advice last time. After some adjustment, the new bakedSDF model seems better
but I have some question about rendering a video
ns-render-mesh --meshfile MESHFILE.ply --traj spiral --output_path [MyPath] mipnerf360-data --data [MyData]
and here's the result. But camera position was fixed above the phoenix statue. what arguments should I set so that the camera can spin around the statue in a horizontal way.@ChicoChen ns-render-mesh
supports different types of trajectories. For example, interpolate
interpolates training camera pose and ellipse
creates ellipse trajectory around scene origin. You could use ns-render
to render RGB images.
I have such a problem
I have been trying to run neus-facto with my custom data here
I processed the data like so:
IMAGE_DIR=<path_to_the_unzipped_file_from_download_link>
PROCESSED_COLMAP_OUTPUT_DIR=processed-colmap
PROCESSED_SDFSTUDIO_OUTPUT_DIR=processed-sdf
SCENE_TYPE=unbound
# process data
ns-process-data $IMAGE_DIR --data church --output-dir $PROCESSED_COLMAP_OUTPUT_DIR
python3.8 scripts/datasets/process_nerfstudio_to_sdfstudio.py --data $PROCESSED_COLMAP_OUTPUT_DIR --output-dir $PROCESSED_SDFSTUDIO_OUTPUT_DIR --data-type colmap --scene-type $SCENE_TYPE
# training
ns-train neus-facto --pipeline.model.sdf-field.inside-outside False --vis viewer --experiment-name church-building sdfstudio-data --data $PROCESSED_SDFSTUDIO_OUTPUT_DIR
The issue is the training gets stuck after ~100 iterations:
I have tried to visualize the training process in the viewer:
I was able to open the matched points from the images generated from CloudCompare:
It seems like the orientation is a bit different, maybe that's the issue?
Guys i don't see in your documentation how i can provide my own dataset.
Is it the same as nerfstudio ns-process-data? I see you have a mask data in your dataset. How do i apply that mask for training?