Open timatchley opened 1 year ago
Also to keep some of the work off you, assume I can dump frames from each synced video i'd have and also create the seq masks folder similar to the datasets you provide. I'm mainly curious what the steps would be for creating the rest of the input data.
Thanks!
I asked this previously here. The data preparation is very specific to the CMU Panoptic dataset.
https://github.com/JonathonLuiten/Dynamic3DGaussians/issues/13
CMU Panoptic
Thanks for the reply.
I see you provide a rough outline. I was hoping for directions even more specific. As in if I were to allow colmap to calculate the camera positions and what not, is there a step by step guide or script that would translate that into the needed files?
Is there a link to the CMU Panoptic dataset preparation that covers this? And how about creating the .npz file?
I appreciate all the feedback, Thanks!
I am working on this currently, but I'm unaffiliated with this project, so I'm reverse engineering.
Here is the CMU download script: https://github.com/CMU-Perceptual-Computing-Lab/panoptic-toolbox
I'm working on this C# script to prepare the data. You can see from the JSON and the npz array, what data is required:
using System.CommandLine;
using System.CommandLine.NamingConventionBinder;
using System.IO.Compression;
using Newtonsoft.Json;
using NumSharp;
static class Program
{
class Args
{
public string InputPath { get; set; }
public string CameraPositions { get; set; }
}
static void Main(string[] args)
{
RootCommand rootCommand = new()
{
new Argument<string>(
"InputPath",
"This is the path to the folder containing the images, and where train_meta.json and init_pt_cld.npz will be written. In the ims folder, each subfolder is a camera"),
new Argument<string>(
"CameraPositions",
"These camera positions are generated in the Colmap")
};
rootCommand.Description = "Initialize the training data for the dynamic gaussian splatting";
// Note that the parameters of the handler method are matched according to the names of the options
rootCommand.Handler = CommandHandler.Create<Args>(Parse);
rootCommand.Invoke(args);
Environment.Exit(0);
}
[Serializable]
public class CameraTransform
{
public int aabb_scale;
public List<Frame> frames;
}
[Serializable]
public class Frame
{
public string file_path;
public float sharpness;
public float[][] transform_matrix;
public float camera_angle_x;
public float camera_angle_y;
public float fl_x;
public float fl_y;
public float k1;
public float k2;
public float k3;
public float k4;
public float p1;
public float p2;
public bool is_fisheye;
public float cx;
public float cy;
public float w;
public float h;
}
[Serializable]
public class train_meta
{
public float w;
public float h;
public List<List<List<float[]>>> k;
public List<List<float[][]>> w2c;
public List<List<string>> fn;
public List<List<int>> cam_id;
}
static void Parse(Args args)
{
CameraTransform cameraTransforms = JsonConvert
.DeserializeObject<CameraTransform>(File.ReadAllText(args.CameraPositions))!;
string imsPath = Path.Combine(args.InputPath, "ims");
int camCount = Directory.EnumerateDirectories(imsPath).Count();
int fileCount = Directory.EnumerateFiles(Directory.EnumerateDirectories(imsPath).ToList()[0]).Count();
train_meta trainMeta = new()
{
w = 640,
h = 360,
fn = new(),
cam_id = new(),
k = new(),
w2c = new()
};
for (int i = 0; i < fileCount; i++)
{
List<string> toInsert = new();
List<int> camToInsert = new();
List<List<float[]>> kToInsert = new();
List<float[][]> wToInsert = new();
for(int j= 0; j < camCount; j++)
{
toInsert.Add($"{j}/{i:D3}.jpg");
camToInsert.Add(j);
Frame cameraFrame = cameraTransforms.frames[j];
List<float[]> kToInsertInner = new()
{
new[]{cameraFrame.fl_x, 0f, cameraFrame.cx},
new[]{0f, cameraFrame.fl_y, cameraFrame.cy},
new[]{0f, 0f, 1f}
};
kToInsert.Add(kToInsertInner);
float[][] w = cameraFrame.transform_matrix;
wToInsert.Add(w);
}
trainMeta.fn.Add(toInsert);
trainMeta.cam_id.Add(camToInsert);
trainMeta.k.Add(kToInsert);
trainMeta.w2c.Add(wToInsert);
}
File.WriteAllText(Path.Combine(args.InputPath, "train_meta.json"), JsonConvert.SerializeObject(trainMeta, Formatting.Indented));
// TODO create point cloud
Dictionary<string, Array> npz = new();
int pointCount = 0; // TODO number of points from Colmap
double[,] data = new double[pointCount, 7];
for (int i = 0; i < pointCount; i++)
{
// point position
data[i, 0] = 0;
data[i, 1] = 0;
data[i, 2] = 0;
// color
data[i, 3] = 0;
data[i, 4] = 0;
data[i, 5] = 0;
//seg
data[i, 6] = 1;
}
npz.Add("data.npz", data);
np.Save_Npz(npz, Path.Combine(args.InputPath, "init_pt_cld.npz"), CompressionLevel.NoCompression);
}
}
Thanks for the speedy reply,
Looks like to fill in the rest of your TODO's you could just use the sparse reconstruction from Colmap. Or does it need the dense reconstruction point cloud? I'm not sure what seg would be though. If i had to guess just an incrementing # possibly for which point number it is? (This is why i am always in favor of variables being named very explicitly :) )
Also, do you know if the calibration needs to be preformed on every set of frames or does it just need colmap run once for an initial set of frames? I feel like that is important to state for those who may be wondering like myself.
I'll try to include a python version of your script that the author of the project can use once I have everything clarified and I have it working.
My guess is that the initial point cloud is needed to seed the training. It would defeat the purpose otherwise to do it for every frame. But input from @JonathonLuiten would be helpful here
Hey everyone. Stoked to see all your interest and excited to help you all figure out how to set this up on your own datasets.
However from now until around Nov 17 I’m going to be super swamped and busy and won’t have much time to really dedicate to this.
I think the thing that would be the most helpful for you all, is if I wrote a script to convert classic static nerf/Gaussian datasets to my format. This could be used to train Gaussian Splatting with my code and would show how to set this up on your own data. Feel free to keep annoying me every couple of days until I do this, but realistically won’t be this week.
Hey @atonalfreerider
I noticed you aren't passing in any of the colmap outputs directly. Instead you seem to have a middle step that is building some sort of json file that is then being read in. Can you provide whatever you are using to get colmap's output into that JSON format?
I've attempted to write a parser my self to just take in the images.txt/cameras.txt directly but this doesn't quite account for all the variables your Frame object such as camera_angle_x
, camera_angle_y
, transform_matrix
, and sharpness
. I rather not try to write all the math to compute all that myself if I don't have to.
Thanks
Looks like you may just be using the resulting transforms.json from instant-ngp. Maybe i'll give that a go =) https://github.com/NVlabs/instant-ngp/blob/master/scripts/colmap2nerf.py
Yes you will notice that only the camera transform is being used, and the focal length x,y and camera digital center x,y
Ok using the colmap2nerf.py seems to have done this trick. My script is expecting it to be run while your CWD is the root of your dataset. Also (for now) it expects your extracted images are in rootCaptureFolder/ims
starting with 000.png and directories 0-numOfCameras.
I'm not done testing with the results but they seem to be parsed fine from the train.py of this project.
The one thing i will note and would love a comment on if anyone has more information, what should the "seg" var in the point cloud be? I have it hardcoded to 1
at the moment.
Here is the python script to run after colmap2nerf.py
import argparse
import json
import os
import sys
import numpy as np
from typing import List, Dict, Any
class CameraTransform:
def __init__(self) -> None:
self.aabb_scale: int = 0
self.frames: List[Frame] = []
class Frame:
def __init__(self) -> None:
self.file_path: str = ""
self.sharpness: float = 0.0
self.transform_matrix: List[List[float]] = []
self.camera_angle_x: float = 0.0
self.camera_angle_y: float = 0.0
self.fl_x: float = 0.0
self.fl_y: float = 0.0
self.k1: float = 0.0
self.k2: float = 0.0
self.k3: float = 0.0
self.k4: float = 0.0
self.p1: float = 0.0
self.p2: float = 0.0
self.is_fisheye: bool = False
self.cx: float = 0.0
self.cy: float = 0.0
self.w: float = 0.0
self.h: float = 0.0
class TrainMeta:
def __init__(self) -> None:
self.w: float = 0.0
self.h: float = 0.0
self.k: List[List[List[List[float]]]] = []
self.w2c: List[List[List[float]]] = []
self.fn: List[List[str]] = []
self.cam_id: List[List[int]] = []
def count_files_in_first_directory(path):
# List all files and directories in the given path
items = os.listdir(path)
# Iterate over the items to find the first directory
for item in items:
item_path = os.path.join(path, item)
if os.path.isdir(item_path):
# If a directory is found, list its contents and count the files
return len([f for f in os.listdir(item_path) if os.path.isfile(os.path.join(item_path, f))])
return 0 # Return 0 if no directory is found
def parse(input_path: str, camera_positions: str) -> None:
transforms_directory = camera_positions
if str(camera_positions).endswith("transforms.json"):
transforms_directory = camera_positions[:-len("transforms.json")]
else:
camera_positions = os.path.join(camera_positions, "transforms.json")
with open(camera_positions, 'r') as file:
camera_transforms = json.load(file)
ims_path = os.path.join(input_path, "ims")
cam_count = len([name for name in os.listdir(ims_path) if os.path.isdir(os.path.join(ims_path, name))])
file_count = count_files_in_first_directory(ims_path)
train_meta = TrainMeta()
train_meta.w = 640
train_meta.h = 360
# ... initialization of other fields ...
for i in range(file_count):
to_insert = []
cam_to_insert = []
k_to_insert = []
w_to_insert = []
for j in range(cam_count):
to_insert.append(f"{j}/{str(i).zfill(3)}.png")
cam_to_insert.append(j)
camera_frame = camera_transforms["frames"][j]
k_to_insert_inner = [
[camera_transforms["fl_x"], 0.0, camera_transforms["cx"]],
[0.0, camera_transforms["fl_y"], camera_transforms["cy"]],
[0.0, 0.0, 1.0]
]
k_to_insert.append(k_to_insert_inner)
w = camera_frame["transform_matrix"]
w_to_insert.append(w)
train_meta.fn.append(to_insert)
train_meta.cam_id.append(cam_to_insert)
train_meta.k.append(k_to_insert)
train_meta.w2c.append(w_to_insert)
with open(os.path.join(transforms_directory, "train_meta.json"), 'w') as file:
json.dump(train_meta.__dict__, file, indent=4)
file_path = os.path.join(transforms_directory, "colmap_text", "points3D.txt")
npz: Dict[str, Any] = {}
data = parse_colmap_points3D(file_path)
npz["data"] = data
np.savez_compressed(os.path.join(input_path, "init_pt_cld.npz"), **npz)
def parse_colmap_points3D(file_path: str) -> np.ndarray:
with open(file_path, 'r') as f:
lines = f.readlines()
# Filter out the lines containing 3D point data
points_lines = [line.strip() for line in lines if not line.startswith("#")]
data = np.zeros((len(points_lines), 7))
for i, line in enumerate(points_lines):
parts = line.split()
# point position
data[i, 0] = float(parts[1])
data[i, 1] = float(parts[2])
data[i, 2] = float(parts[3])
# color
data[i, 3] = int(parts[4])
data[i, 4] = int(parts[5])
data[i, 5] = int(parts[6])
# seg - I have no idea what the value should be here! Leaving it as '1' for now
data[i, 6] = 1
return data
def main():
parser = argparse.ArgumentParser(description="Initialize the training data for the dynamic gaussian splatting")
parser.add_argument("InputPath", help="This is the path to the folder containing the images, and where train_meta.json and init_pt_cld.npz will be written. In the ims folder, each subfolder is a camera")
parser.add_argument("CameraPositions", help="These camera positions are generated in the Colmap")
args = parser.parse_args()
parse(args.InputPath, args.CameraPositions)
sys.exit(0)
if __name__ == "__main__":
main()
I imagine some changes will need to made to this script but it a starting point for now.
I'll try to update this thread with how it goes but my dataset is still missing the "seg" files that I need to generate with my dataset. Right now train.py
is breaking when it tries to load those files that I have not created yet.
so is this solved?
Hard coding at 1 is fine! That is what I would have done for a demo :)
It only needs to be not 1, for points your are 100% certain are static. If you don't know, then all 1 should be good :)
I am training 150 frame scene as we speak. I'll close it tomorrow if my results come out well.
Training completed. The resulting output was a 4KB file which when visualized was blank. So there is still something wrong. I'll try investigating further.
Here is what my captureFolder/ims/0 starts with
And my captureFolder/seg/0 starts with
The only real difference I can notice between my dataset and the sample is I am using .png for my RGB and .jpg for my bg removed bitmasks. But i accounted for this by changing the load_dataset function to swap .png for .jpg.
My dataset is using 5 cameras that does have a convergent result from colmap. I've tried running my process using the original provided test datasets but colmap does NOT converge when I try using the first frame from each camera on the "basketball" dataset. I'm assuming this is probably consistent on all the provided datasets as they are all shot in the same capture setup.
This is what my params.npz result looked like when viewed from a text editor.
Debugging this is quite challenging for pretty obvious reasons so any insight would be appreciated @JonathonLuiten
The params is a dict of numpy arrays. Open it like done in the vizualizer to see the values inside.
If the colmap didn't work and the camera poses are wrong it DEFINITELY will not work...
E.g. if you can't get original gaussian splatting (static scene) working on the first frame, then this dynamic stuff won't work.
I would start with getting static working on the first frame first.
I was able to get static gaussian splatting to work with the dataset I used to run colmap with, granted it's only 5 cameras and the quality isn't terrific, it did work.
Looking at the params in visualize.py isn't giving me too much readable information other than it looks like means3D
through log_scales
all have no values
Here is what my params look like in train.py immediately after calling initialize_params
Which would explain why the file is so small. So something must be failing to happen in the training. Since you know the logic there, any clue what it could be?
No idea, but try set number of time steps to 1, and thus fit the static scene in the first timestep with my code.
Debug through looking at the “params” dict and see when it becomes empty?
It becomes empty on step 0 iteration 3000 in external.py line 205
Here is execution on line 204, values are still there
line 204 to_remove
must be getting set to all the entries and then line 205 it removes them all and I'm left with means3D
and others having Tensor shapes of 0,3 instead of 1758,3.
After i step over 205
Looks like the logic on this line
big_points_ws = torch.exp(params['log_scales']).max(dim=1).values > 0.1 * variables['scene_radius']
results in True
for every single point. And there for it removes every point.
Maybe you can explain what is going on here? My scene_radius
is 0.7211402587951079
so I guess whatever threshold that is being tested here is too small resulting in it purging all the data. Can you provide some insight to what is going on here? Maybe it will help me understand what the limitations are on input data?
colmap2nerf doesn't keep the images in order so the transform matrices aren't mapping to the correct images in the dataset. It also uses a single set of camera intrinsics instead of a set on every frame like @atonalfreerider 's script expected. I've modified my python code to sort the frames
by the number of the file in file_name
from transforms.json. I've also modified the python script for computing the training metadata to reuse the camera w
and h
from transforms.json.
Here is the updated python script:
import argparse
import json
import os
import sys
import numpy as np
import re
from typing import List, Dict, Any
class CameraTransform:
def __init__(self) -> None:
self.aabb_scale: int = 0
self.frames: List[Frame] = []
class Frame:
def __init__(self) -> None:
self.file_path: str = ""
self.sharpness: float = 0.0
self.transform_matrix: List[List[float]] = []
self.camera_angle_x: float = 0.0
self.camera_angle_y: float = 0.0
self.fl_x: float = 0.0
self.fl_y: float = 0.0
self.k1: float = 0.0
self.k2: float = 0.0
self.k3: float = 0.0
self.k4: float = 0.0
self.p1: float = 0.0
self.p2: float = 0.0
self.is_fisheye: bool = False
self.cx: float = 0.0
self.cy: float = 0.0
self.w: float = 0.0
self.h: float = 0.0
class TrainMeta:
def __init__(self) -> None:
self.w: float = 0.0
self.h: float = 0.0
self.k: List[List[List[List[float]]]] = []
self.w2c: List[List[List[float]]] = []
self.fn: List[List[str]] = []
self.cam_id: List[List[int]] = []
def get_number(frame):
return int(re.search(r'(\d+).png$', frame["file_path"]).group(1))
def count_files_in_first_directory(path):
# List all files and directories in the given path
items = os.listdir(path)
# Iterate over the items to find the first directory
for item in items:
item_path = os.path.join(path, item)
if os.path.isdir(item_path):
# If a directory is found, list its contents and count the files
return len([f for f in os.listdir(item_path) if os.path.isfile(os.path.join(item_path, f))])
return 0 # Return 0 if no directory is found
def parse(input_path: str, camera_positions: str) -> None:
transforms_directory = camera_positions
if str(camera_positions).endswith("transforms.json"):
transforms_directory = camera_positions[:-len("transforms.json")]
else:
camera_positions = os.path.join(camera_positions, "transforms.json")
with open(camera_positions, 'r') as file:
camera_transforms = json.load(file)
ims_path = os.path.join(input_path, "ims")
cam_count = len([name for name in os.listdir(ims_path) if os.path.isdir(os.path.join(ims_path, name))])
file_count = count_files_in_first_directory(ims_path)
train_meta = TrainMeta()
train_meta.w = int(camera_transforms['w'])
train_meta.h = int(camera_transforms['h'])
# ... initialization of other fields ...
#Need to sort the frames by file_path ending # in numerical order
sorted_frames = sorted(camera_transforms["frames"], key=get_number)
for i in range(file_count):
to_insert = []
cam_to_insert = []
k_to_insert = []
w_to_insert = []
for j in range(cam_count):
to_insert.append(f"{j}/{str(i).zfill(3)}.png")
cam_to_insert.append(j)
camera_frame = sorted_frames[j]
k_to_insert_inner = [
[camera_transforms["fl_x"], 0.0, camera_transforms["cx"]],
[0.0, camera_transforms["fl_y"], camera_transforms["cy"]],
[0.0, 0.0, 1.0]
]
k_to_insert.append(k_to_insert_inner)
w = camera_frame["transform_matrix"]
w_to_insert.append(w)
train_meta.fn.append(to_insert)
train_meta.cam_id.append(cam_to_insert)
train_meta.k.append(k_to_insert)
train_meta.w2c.append(w_to_insert)
with open(os.path.join(transforms_directory, "train_meta.json"), 'w') as file:
json.dump(train_meta.__dict__, file, indent=4)
file_path = os.path.join(transforms_directory, "colmap_text", "points3D.txt")
npz: Dict[str, Any] = {}
data = parse_colmap_points3D(file_path)
npz["data"] = data
np.savez_compressed(os.path.join(input_path, "init_pt_cld.npz"), **npz)
def parse_colmap_points3D(file_path: str) -> np.ndarray:
with open(file_path, 'r') as f:
lines = f.readlines()
# Filter out the lines containing 3D point data
points_lines = [line.strip() for line in lines if not line.startswith("#")]
data = np.zeros((len(points_lines), 7))
for i, line in enumerate(points_lines):
parts = line.split()
# point position
data[i, 0] = float(parts[1])
data[i, 1] = float(parts[2])
data[i, 2] = float(parts[3])
# color
data[i, 3] = int(parts[4])
data[i, 4] = int(parts[5])
data[i, 5] = int(parts[6])
# seg - I have no idea what the value should be here! Leaving it as '1' for now
data[i, 6] = 1
return data
def main():
parser = argparse.ArgumentParser(description="Initialize the training data for the dynamic gaussian splatting")
parser.add_argument("InputPath", help="This is the path to the folder containing the images, and where train_meta.json and init_pt_cld.npz will be written. In the ims folder, each subfolder is a camera")
parser.add_argument("CameraPositions", help="These camera positions are generated in the Colmap")
args = parser.parse_args()
parse(args.InputPath, args.CameraPositions)
sys.exit(0)
if __name__ == "__main__":
main()
Additionally I've created a second dataset that has 32 cameras and a very clean colmap sparse reconstruction. So there is still something I believe in the math off here that I'm hoping someone can figure out.
@JonathonLuiten is there any good way to verify my training metadata is valid, or visualize the positions or something of that nature that may allow to see if there is something off with the calculations? Or is there something with the CMU dataset you are using that is specific to that dataset which you had to account for when creating your Camera objects in setup_camera
from helpers.py
?
The PSNR starts negative and works its way to around 11 by the time it gets to 10k steps on my new dataset which is still quite far behind the 20's the provided datasets get.
Just to be clear, it is still not working for me. I believe there is something small off with the metadata/poses. Because when i train it ends up removing all the points after enough iterations. To the point the output data is blank. So my hunch is the what it is seeing from the images vs the numbers it's getting fed in for the positional information are off. I imagine it's something small which is why I'm asking if there is a good way we can visualize anything here from the camera positions to confirm if its right or wrong. =). As looking at numbers doesn't tell me much.
There maybe is a difference in how colmap2nerf.py
calculates some of the values vs what the expected values are. That is what i'd like someone to let me know what is wrong if they can figure it out. Or give me another script that calculates the data given a OPENCV colmap sparse configuration.
Any help is welcome!
Maybe you should adjust the parameters of colmap in colmap2nerf.py? I tried to use colmap2nerf.py to generate train_meta.json, all the K matrices are the same, I think this is wrong, I am using the data from juggle in the project.
I'm aware they all come out the same but isn't K the camera intrinsics? As long as the cameras are identical hardware/settings shouldn't they be the same? The colmap2nerf script uses the single camera setting hence why there is only one set of camera intrinsics in their result.
Edit: I took a closer look at colmap2nerf.py and it looks like the values cx, cy, fl_x, fl_y
that we are using to build the K
matrices come directly from COLMAP. So this would imply Colmap is not calculating these values correctly? I think it's more likely that it's the camera extrinsics (transformational matrix) that colmap2nerf.py is doing something that isn't compatible with what Dynamic3DGaussians is expecting.
I'm aware they all come out the same but isn't K the camera intrinsics? As long as the cameras are identical hardware/settings shouldn't they be the same? The colmap2nerf script uses the single camera setting hence why there is only one set of camera intrinsics in their result.
Edit: I took a closer look at colmap2nerf.py and it looks like the values
cx, cy, fl_x, fl_y
that we are using to build theK
matrices come directly from COLMAP. So this would imply Colmap is not calculating these values correctly? I think it's more likely that it's the camera extrinsics (transformational matrix) that colmap2nerf.py is doing something that isn't compatible with what Dynamic3DGaussians is expecting.I'm aware they all come out the same but isn't K the camera intrinsics? As long as the cameras are identical hardware/settings shouldn't they be the same? The colmap2nerf script uses the single camera setting hence why there is only one set of camera intrinsics in their result.
Edit: I took a closer look at colmap2nerf.py and it looks like the values
cx, cy, fl_x, fl_y
that we are using to build theK
matrices come directly from COLMAP. So this would imply Colmap is not calculating these values correctly? I think it's more likely that it's the camera extrinsics (transformational matrix) that colmap2nerf.py is doing something that isn't compatible with what Dynamic3DGaussians is expecting.
You are right. If the same camera is used, the K matrix is indeed the same. I modified colmap2nerf.py so that colmap re-predicts camera intrinsics for each camera, but Dynamic3DGaussians is still not compatible.
Hi everyone, I'll join the conversation as I'm having very similar issues to the ones you mentioned. I reverse-engineered the input data and was able to extract and format data from a) COLMAP and b) ground truth camera poses and points (Blender). I am only working with frame 0, as I am currently interested in setting a pipeline for the static version.
Regarding data extracted from COLMAP, I got a poor reconstruction, but this was expected as the object is quite challenging to reconstruct. However, I got white images for data where I know the ground truth, which I did not expect. I investigated a bit and noticed an issue in this line from train.py
: im, radius, _, = Renderer(....)
. All the radii are set to 0 for some reason. Given that the w2c and intrinsics should be correct, as I extracted them directly from Blender, do you have any intuition on why this is the case? Does anybody face a similar issue?
Thanks
Follow up on my previous comment - I have noticed a few things that might be helpful:
1) First and foremost, setting the segmentation loss = 0 improved the training procedure significantly. This alone solved many of the issues I faced during training. You can find this in train.py
-> get_loss()
-> losses['seg'] = ..
. (I am doing static reconstruction currently, I don't know to what extend dynamic reconstruction could be affected by this).
1) I found the training procedure to be very sensitive to the initial hyperparameters
2) densify()
removes, clones, and splits gaussians according to multiple heuristics. By tuning those heuristics, I could achieve better performance.
Hope this helps!
@maurock very cool to hear you are getting somewhere with your own datasets.
Could you go into detail a little as to what you did to reverse-engineer the input data to the correct format? Can you provide any scripts to help with this task?
Hi everyone, I'll join the conversation as I'm having very similar issues to the ones you mentioned. I reverse-engineered the input data and was able to extract and format data from a) COLMAP and b) ground truth camera poses and points (Blender). I am only working with frame 0, as I am currently interested in setting a pipeline for the static version.
Regarding data extracted from COLMAP, I got a poor reconstruction, but this was expected as the object is quite challenging to reconstruct. However, I got white images for data where I know the ground truth, which I did not expect. I investigated a bit and noticed an issue in this line from
train.py
:im, radius, _, = Renderer(....)
. All the radii are set to 0 for some reason. Given that the w2c and intrinsics should be correct, as I extracted them directly from Blender, do you have any intuition on why this is the case? Does anybody face a similar issue?Thanks
Hi,Are the camera parameters of your own datasets obtained through colmap? What does b)ground truth camera poses and points (Blender) include? Extrinsics matrix and initial point cloud? So did you only use the Intrinsics matrix part of the data obtained by colmap?
The code I am using to go from Blender to 3DGS is very much integrated with my pipeline and scattered around currently, but I will clean it and share a few scripts to extract data from Blender whenever I can. In the meantime, what I did was:
init_pt_cld.npz
, ims
, seg
folders, and train_meta.json
, plus I make sure to have consistent camera configurations). With careful tuning of the learning rates + heuristics, I was able to correctly reconstruct the Blender objects.The code I am using to go from Blender to 3DGS is very much integrated with my pipeline and scattered around currently, but I will clean it and share a few scripts to extract data from Blender whenever I can. In the meantime, what I did was:
- Extracted data from Blender. This means I have a few scripts to generate cameras in Blender, get their intrinsics/extrinsics matrices and points on the surface of the objects using the bpy package. I then have additional scripts to format this data as required by this repository (I basically create the
init_pt_cld.npz
,ims
,seg
folders, andtrain_meta.json
, plus I make sure to have consistent camera configurations). With careful tuning of the learning rates + heuristics, I was able to correctly reconstruct the Blender objects.- I have an additional pipeline to extract camera poses and a pointcloud using COLMAP. I borrowed parts of the code from the original 3D Gaussian Splatting repo to do that. the data is in the correct format, but results are not good yet - I suspect it has to do with hypertuning.
Thank you very much for your reply. Based on your reply, I'm getting good results on my own dataset, but one thing is weird, as the time step increases, the PSNR value keeps decreasing and I'm still looking for the problem.
@ch1998 Great to hear you are seeing positive results! Did you follow my suggestions here? In that case, it looks like the segmentation loss shouldn't be set to 0 for dynamic reconstruction. You could try to set the segmentation loss equal to 0 for the static reconstruction (the first 10000 iterations in your case, which works well) and revert it to the value the authors chose for the following iterations. I think proper segmentation images are needed for good dynamic results. If you don't currently have those images, a workaround could be to set the seg
values to 1 (=dynamic) for every point in your initial point cloud - according to this answer. The segmentation images should probably reflect this - just a plain white image should do. In case you try, please let us know how it goes!
@maurock I follow on your suggestions, set segmentation loss=0 and obtained good reconstruction results when t=0, 1, and 2, but as t increases, the results will become bad. Segmentation loss=0 maybe only apply to static reconstruction. Importantly, the initial point cloud is important. I use the dense point cloud reconstructed by colmap.
If either @maurock or @ch1998 have the time to share their data preparation scripts it would be much appreciated!
If either @maurock or @ch1998 have the time to share their data preparation scripts it would be much appreciated!
@henrypearce4D Sure, I am working on it, I'll share it later today!
If either @maurock or @ch1998 have the time to share their data preparation scripts it would be much appreciated!
I am using the code shared by @timatchley , using colmap for registration. You need to use colmap's dense reconstruction to get the init point cloud. Then, it is adjusted according to maurock's parameters. In fact currently on my data only the first few frames have good quality. I'm still trying to figure it out.
I think the problem is not with the camera parameters obtained by colmap, that is accurate. Maybe the hyperparameters need to be adjusted.
@henrypearce4D I have added the scripts and instructions to the touch3DGS
branch in my fork: https://github.com/maurock/Dynamic3DGaussians/tree/touch3DGS
data_making/blender_script.py
and data_making/blender_to_data.py
.I hope this helps!
I'm trying it out with full dense reconstruction and 'seg' loss set to 0.0. The first frame is off to a good start it appears. PSNR 32
Similar results to @ch1998
Started off great but went downhill quickly. This was with dense colmap reconstruction as init point cloud and using 0 for seg loss.
Any tips on how to tune the loss parameters or whatever needs tuning? @JonathonLuiten , any insight on to what could be going on? First frame comes out pretty clean with 'seg' loss set to 0.0. But whatever the process that is going on seems degrade in quality quickly after.
We're very close, hope to hear what the solution is soon :)
@henrypearce4D I have added the scripts and instructions to the
touch3DGS
branch in my fork: https://github.com/maurock/Dynamic3DGaussians/tree/touch3DGS
- Here's the link relevant section of the README.md file.
- You'll find the scripts you need in
data_making/blender_script.py
anddata_making/blender_to_data.py
.I hope this helps!
@maurock Wow thankyou! hopefully I can try this ASAP!
remove_threshold = 0.25 if i == 5000 else 0.005
This condition by the author at line 200 in the external.py file is quite strange since it completely prunes the low opacities, which in my case also happens. I'm trying to train it on my custom dataset created on Unity. Strangely, if I'm not wrong, the original paper on Gaussian splitting keeps the threshold at 0.005. If I set the threshold at 0.005 the model learns to reconstruct the first initial time step.
@JonathonLuiten , any progress here? Still haven't been able to get my own data to train more than a frame before progressively falls more and more apart on each subsequent frame in time.
I would love to be able to make use of this project still
@henrypearce4D I have added the scripts and instructions to the
touch3DGS
branch in my fork: https://github.com/maurock/Dynamic3DGaussians/tree/touch3DGS
- Here's the link relevant section of the README.md file.
- You'll find the scripts you need in
data_making/blender_script.py
anddata_making/blender_to_data.py
.I hope this helps!
@maurock Wow thankyou! hopefully I can try this ASAP!
is this works? Any challenges do you faced please possible give the all data-set preparation process in details.
@Tejasdavande07 hi as far as I'm aware this script released was only for synthetically rendered scenes and not real footage, so I didn't experiment further
@Tejasdavande07 hi as far as I'm aware this script released was only for synthetically rendered scenes and not real footage, so I didn't experiment further
Ohk. thanks for your reply. I am checking for the real footage process.
@JonathonLuiten , any progress here? Still haven't been able to get my own data to train more than a frame before progressively falls more and more apart on each subsequent frame in time.
I would love to be able to make use of this project still
Try to disable all loss except image, rigit, rot, and iso. In my case it works nicely if I only enable these losses.
@JonathonLuiten , any progress here? Still haven't been able to get my own data to train more than a frame before progressively falls more and more apart on each subsequent frame in time. I would love to be able to make use of this project still
Try to disable all loss except image, rigit, rot, and iso. In my case it works nicely if I only enable these losses.
I only used the loss of im, rigid, rot and iso. Although I could achieve PSNR of 30+, the tracking effect was not very good as shown in the figure. Do you have a good solution?
best!!!
@JonathonLuiten,有什么进展吗?仍然无法让我的数据训练超过一帧,然后随着时间的推移,在后续的每一帧上逐渐分崩离析。 我很想能够继续利用这个项目
尝试禁用除 image、rigit、rot 和 iso 之外的所有损失。就我而言,如果我只启用这些损失,效果会很好。
我只用了im, rigid, rot, iso这几个loss,虽然能达到30+的PSNR,但是跟踪效果如图,不是很好,有好的解决办法吗?
最好的!!!
你好,请问你是用几个相机拍出的数据
I was able to get static gaussian splatting to work with the dataset I used to run colmap with, granted it's only 5 cameras and the quality isn't terrific, it did work.
Looking at the params in visualize.py isn't giving me too much readable information other than it looks like
means3D
throughlog_scales
all have no valuesHere is what my params look like in train.py immediately after calling
initialize_params
Which would explain why the file is so small. So something must be failing to happen in the training. Since you know the logic there, any clue what it could be?
Hello, may I ask how you calibrate the five cameras?
Hi there, I am very impressed with your results and I will say you've done quite a good job keeping your code clean and compact.
I noticed you said you were working on creating some code to help aid the creation of custom datasets, but would it be possible in the meantime to provide us with instructions to manually create one using whatever method the test dataset was created with?
I'd love to test this using some of my own test data and see how the results come out.
I'm looking forward to hearing back from you, and again, thanks for all the work!