Instructions for creating our own dataset to test with

timatchley commented 1 year ago

Hi there, I am very impressed with your results and I will say you've done quite a good job keeping your code clean and compact.

I noticed you said you were working on creating some code to help aid the creation of custom datasets, but would it be possible in the meantime to provide us with instructions to manually create one using whatever method the test dataset was created with?

I'd love to test this using some of my own test data and see how the results come out.

I'm looking forward to hearing back from you, and again, thanks for all the work!

timatchley commented 1 year ago

Also to keep some of the work off you, assume I can dump frames from each synced video i'd have and also create the seq masks folder similar to the datasets you provide. I'm mainly curious what the steps would be for creating the rest of the input data.

Thanks!

atonalfreerider commented 1 year ago

I asked this previously here. The data preparation is very specific to the CMU Panoptic dataset.

https://github.com/JonathonLuiten/Dynamic3DGaussians/issues/13

timatchley commented 1 year ago

CMU Panoptic

Thanks for the reply.

I see you provide a rough outline. I was hoping for directions even more specific. As in if I were to allow colmap to calculate the camera positions and what not, is there a step by step guide or script that would translate that into the needed files?

Is there a link to the CMU Panoptic dataset preparation that covers this? And how about creating the .npz file?

I appreciate all the feedback, Thanks!

atonalfreerider commented 1 year ago

I am working on this currently, but I'm unaffiliated with this project, so I'm reverse engineering.

Here is the CMU download script: https://github.com/CMU-Perceptual-Computing-Lab/panoptic-toolbox

I'm working on this C# script to prepare the data. You can see from the JSON and the npz array, what data is required:


using System.CommandLine;
using System.CommandLine.NamingConventionBinder;
using System.IO.Compression;
using Newtonsoft.Json;
using NumSharp;

static class Program
{
    class Args
    {
        public string InputPath { get; set; }
        public string CameraPositions { get; set; }
    }

    static void Main(string[] args)
    {
        RootCommand rootCommand = new()
        {
            new Argument<string>(
                "InputPath",
                "This is the path to the folder containing the images, and where train_meta.json and init_pt_cld.npz will be written. In the ims folder, each subfolder is a camera"),

            new Argument<string>(
                "CameraPositions", 
                "These camera positions are generated in the Colmap")
        };

        rootCommand.Description = "Initialize the training data for the dynamic gaussian splatting";

        // Note that the parameters of the handler method are matched according to the names of the options 
        rootCommand.Handler = CommandHandler.Create<Args>(Parse);

        rootCommand.Invoke(args);

        Environment.Exit(0);
    }

    [Serializable]
    public class CameraTransform
    {
        public int aabb_scale;
        public List<Frame> frames;
    }

    [Serializable]
    public class Frame
    {
        public string file_path;
        public float sharpness;
        public float[][] transform_matrix;
        public float camera_angle_x;
        public float camera_angle_y;
        public float fl_x;
        public float fl_y;
        public float k1;
        public float k2;
        public float k3;
        public float k4;
        public float p1;
        public float p2;
        public bool is_fisheye;
        public float cx;
        public float cy;
        public float w;
        public float h;
    }

    [Serializable]
    public class train_meta
    {
        public float w;
        public float h;
        public List<List<List<float[]>>> k;
        public List<List<float[][]>> w2c;
        public List<List<string>> fn;
        public List<List<int>> cam_id;
    }

    static void Parse(Args args)
    {
        CameraTransform cameraTransforms = JsonConvert
            .DeserializeObject<CameraTransform>(File.ReadAllText(args.CameraPositions))!;

        string imsPath = Path.Combine(args.InputPath, "ims");
        int camCount = Directory.EnumerateDirectories(imsPath).Count();
        int fileCount = Directory.EnumerateFiles(Directory.EnumerateDirectories(imsPath).ToList()[0]).Count();

        train_meta trainMeta = new()
        {
            w = 640,
            h = 360,
            fn = new(),
            cam_id = new(),
            k = new(),
            w2c = new()
        };

        for (int i = 0; i < fileCount; i++)
        {
            List<string> toInsert = new();
            List<int> camToInsert = new();
            List<List<float[]>> kToInsert = new();
            List<float[][]> wToInsert = new();
            for(int j= 0; j < camCount; j++)
            {
                toInsert.Add($"{j}/{i:D3}.jpg");
                camToInsert.Add(j);
                Frame cameraFrame = cameraTransforms.frames[j];
                List<float[]> kToInsertInner = new()
                {
                    new[]{cameraFrame.fl_x, 0f, cameraFrame.cx},
                    new[]{0f, cameraFrame.fl_y, cameraFrame.cy},
                    new[]{0f, 0f, 1f}
                };
                kToInsert.Add(kToInsertInner);

                float[][] w = cameraFrame.transform_matrix;
                wToInsert.Add(w);
            }
            trainMeta.fn.Add(toInsert);
            trainMeta.cam_id.Add(camToInsert);
            trainMeta.k.Add(kToInsert);
            trainMeta.w2c.Add(wToInsert);
        }

        File.WriteAllText(Path.Combine(args.InputPath, "train_meta.json"), JsonConvert.SerializeObject(trainMeta, Formatting.Indented));

        // TODO create point cloud
        Dictionary<string, Array> npz = new();
        int pointCount = 0; // TODO number of points from Colmap
        double[,] data = new double[pointCount, 7];
        for (int i = 0; i < pointCount; i++)
        {
            // point position
            data[i, 0] = 0;
            data[i, 1] = 0;
            data[i, 2] = 0;

            // color
            data[i, 3] = 0;
            data[i, 4] = 0;
            data[i, 5] = 0;

            //seg
            data[i, 6] = 1;
        }
        npz.Add("data.npz", data);
        np.Save_Npz(npz, Path.Combine(args.InputPath, "init_pt_cld.npz"), CompressionLevel.NoCompression);
    }
}

timatchley commented 1 year ago

Thanks for the speedy reply,

Looks like to fill in the rest of your TODO's you could just use the sparse reconstruction from Colmap. Or does it need the dense reconstruction point cloud? I'm not sure what seg would be though. If i had to guess just an incrementing # possibly for which point number it is? (This is why i am always in favor of variables being named very explicitly :) )

Also, do you know if the calibration needs to be preformed on every set of frames or does it just need colmap run once for an initial set of frames? I feel like that is important to state for those who may be wondering like myself.

I'll try to include a python version of your script that the author of the project can use once I have everything clarified and I have it working.

atonalfreerider commented 1 year ago

My guess is that the initial point cloud is needed to seed the training. It would defeat the purpose otherwise to do it for every frame. But input from @JonathonLuiten would be helpful here

JonathonLuiten commented 1 year ago

Hey everyone. Stoked to see all your interest and excited to help you all figure out how to set this up on your own datasets.

However from now until around Nov 17 I’m going to be super swamped and busy and won’t have much time to really dedicate to this.

I think the thing that would be the most helpful for you all, is if I wrote a script to convert classic static nerf/Gaussian datasets to my format. This could be used to train Gaussian Splatting with my code and would show how to set this up on your own data. Feel free to keep annoying me every couple of days until I do this, but realistically won’t be this week.

timatchley commented 1 year ago

Hey @atonalfreerider

I noticed you aren't passing in any of the colmap outputs directly. Instead you seem to have a middle step that is building some sort of json file that is then being read in. Can you provide whatever you are using to get colmap's output into that JSON format?

I've attempted to write a parser my self to just take in the images.txt/cameras.txt directly but this doesn't quite account for all the variables your Frame object such as camera_angle_x, camera_angle_y, transform_matrix, and sharpness. I rather not try to write all the math to compute all that myself if I don't have to.

Thanks

timatchley commented 1 year ago

Looks like you may just be using the resulting transforms.json from instant-ngp. Maybe i'll give that a go =) https://github.com/NVlabs/instant-ngp/blob/master/scripts/colmap2nerf.py

atonalfreerider commented 1 year ago

Yes you will notice that only the camera transform is being used, and the focal length x,y and camera digital center x,y

timatchley commented 1 year ago

Ok using the colmap2nerf.py seems to have done this trick. My script is expecting it to be run while your CWD is the root of your dataset. Also (for now) it expects your extracted images are in rootCaptureFolder/ims starting with 000.png and directories 0-numOfCameras.

I'm not done testing with the results but they seem to be parsed fine from the train.py of this project.

The one thing i will note and would love a comment on if anyone has more information, what should the "seg" var in the point cloud be? I have it hardcoded to 1 at the moment.

Here is the python script to run after colmap2nerf.py

import argparse
import json
import os
import sys
import numpy as np
from typing import List, Dict, Any

class CameraTransform:
    def __init__(self) -> None:
        self.aabb_scale: int = 0
        self.frames: List[Frame] = []

class Frame:
    def __init__(self) -> None:
        self.file_path: str = ""
        self.sharpness: float = 0.0
        self.transform_matrix: List[List[float]] = []
        self.camera_angle_x: float = 0.0
        self.camera_angle_y: float = 0.0
        self.fl_x: float = 0.0
        self.fl_y: float = 0.0
        self.k1: float = 0.0
        self.k2: float = 0.0
        self.k3: float = 0.0
        self.k4: float = 0.0
        self.p1: float = 0.0
        self.p2: float = 0.0
        self.is_fisheye: bool = False
        self.cx: float = 0.0
        self.cy: float = 0.0
        self.w: float = 0.0
        self.h: float = 0.0

class TrainMeta:
    def __init__(self) -> None:
        self.w: float = 0.0
        self.h: float = 0.0
        self.k: List[List[List[List[float]]]] = []
        self.w2c: List[List[List[float]]] = []
        self.fn: List[List[str]] = []
        self.cam_id: List[List[int]] = []

def count_files_in_first_directory(path):
    # List all files and directories in the given path
    items = os.listdir(path)

    # Iterate over the items to find the first directory
    for item in items:
        item_path = os.path.join(path, item)
        if os.path.isdir(item_path):
            # If a directory is found, list its contents and count the files
            return len([f for f in os.listdir(item_path) if os.path.isfile(os.path.join(item_path, f))])
    return 0  # Return 0 if no directory is found

def parse(input_path: str, camera_positions: str) -> None:
    transforms_directory = camera_positions
    if str(camera_positions).endswith("transforms.json"):
        transforms_directory = camera_positions[:-len("transforms.json")]
    else:
        camera_positions = os.path.join(camera_positions, "transforms.json")
    with open(camera_positions, 'r') as file:
        camera_transforms = json.load(file)

    ims_path = os.path.join(input_path, "ims")
    cam_count = len([name for name in os.listdir(ims_path) if os.path.isdir(os.path.join(ims_path, name))])
    file_count = count_files_in_first_directory(ims_path)

    train_meta = TrainMeta()
    train_meta.w = 640
    train_meta.h = 360
    # ... initialization of other fields ...

    for i in range(file_count):
        to_insert = []
        cam_to_insert = []
        k_to_insert = []
        w_to_insert = []
        for j in range(cam_count):
            to_insert.append(f"{j}/{str(i).zfill(3)}.png")
            cam_to_insert.append(j)
            camera_frame = camera_transforms["frames"][j]
            k_to_insert_inner = [
                [camera_transforms["fl_x"], 0.0, camera_transforms["cx"]],
                [0.0, camera_transforms["fl_y"], camera_transforms["cy"]],
                [0.0, 0.0, 1.0]
            ]
            k_to_insert.append(k_to_insert_inner)
            w = camera_frame["transform_matrix"]
            w_to_insert.append(w)

        train_meta.fn.append(to_insert)
        train_meta.cam_id.append(cam_to_insert)
        train_meta.k.append(k_to_insert)
        train_meta.w2c.append(w_to_insert)

    with open(os.path.join(transforms_directory, "train_meta.json"), 'w') as file:
        json.dump(train_meta.__dict__, file, indent=4)

    file_path = os.path.join(transforms_directory, "colmap_text", "points3D.txt")
    npz: Dict[str, Any] = {}
    data = parse_colmap_points3D(file_path)
    npz["data"] = data
    np.savez_compressed(os.path.join(input_path, "init_pt_cld.npz"), **npz)

def parse_colmap_points3D(file_path: str) -> np.ndarray:
    with open(file_path, 'r') as f:
        lines = f.readlines()

    # Filter out the lines containing 3D point data
    points_lines = [line.strip() for line in lines if not line.startswith("#")]

    data = np.zeros((len(points_lines), 7))

    for i, line in enumerate(points_lines):
        parts = line.split()

        # point position
        data[i, 0] = float(parts[1])
        data[i, 1] = float(parts[2])
        data[i, 2] = float(parts[3])

        # color
        data[i, 3] = int(parts[4])
        data[i, 4] = int(parts[5])
        data[i, 5] = int(parts[6])

        # seg  - I have no idea what the value should be here! Leaving it as '1' for now
        data[i, 6] = 1

    return data

def main():
    parser = argparse.ArgumentParser(description="Initialize the training data for the dynamic gaussian splatting")
    parser.add_argument("InputPath", help="This is the path to the folder containing the images, and where train_meta.json and init_pt_cld.npz will be written. In the ims folder, each subfolder is a camera")
    parser.add_argument("CameraPositions", help="These camera positions are generated in the Colmap")

    args = parser.parse_args()
    parse(args.InputPath, args.CameraPositions)

    sys.exit(0)

if __name__ == "__main__":
    main()

I imagine some changes will need to made to this script but it a starting point for now.

I'll try to update this thread with how it goes but my dataset is still missing the "seg" files that I need to generate with my dataset. Right now train.py is breaking when it tries to load those files that I have not created yet.

JonathonLuiten commented 1 year ago

so is this solved?

Hard coding at 1 is fine! That is what I would have done for a demo :)

It only needs to be not 1, for points your are 100% certain are static. If you don't know, then all 1 should be good :)

timatchley commented 1 year ago

I am training 150 frame scene as we speak. I'll close it tomorrow if my results come out well.

timatchley commented 1 year ago

Training completed. The resulting output was a 4KB file which when visualized was blank. So there is still something wrong. I'll try investigating further.

timatchley commented 1 year ago

Here is what my captureFolder/ims/0 starts with

And my captureFolder/seg/0 starts with

The only real difference I can notice between my dataset and the sample is I am using .png for my RGB and .jpg for my bg removed bitmasks. But i accounted for this by changing the load_dataset function to swap .png for .jpg.

My dataset is using 5 cameras that does have a convergent result from colmap. I've tried running my process using the original provided test datasets but colmap does NOT converge when I try using the first frame from each camera on the "basketball" dataset. I'm assuming this is probably consistent on all the provided datasets as they are all shot in the same capture setup.

This is what my params.npz result looked like when viewed from a text editor.

Debugging this is quite challenging for pretty obvious reasons so any insight would be appreciated @JonathonLuiten

JonathonLuiten commented 1 year ago

The params is a dict of numpy arrays. Open it like done in the vizualizer to see the values inside.

If the colmap didn't work and the camera poses are wrong it DEFINITELY will not work...

E.g. if you can't get original gaussian splatting (static scene) working on the first frame, then this dynamic stuff won't work.

I would start with getting static working on the first frame first.

timatchley commented 1 year ago

I was able to get static gaussian splatting to work with the dataset I used to run colmap with, granted it's only 5 cameras and the quality isn't terrific, it did work.

Looking at the params in visualize.py isn't giving me too much readable information other than it looks like means3D through log_scales all have no values

Here is what my params look like in train.py immediately after calling initialize_params

Which would explain why the file is so small. So something must be failing to happen in the training. Since you know the logic there, any clue what it could be?

JonathonLuiten commented 1 year ago

No idea, but try set number of time steps to 1, and thus fit the static scene in the first timestep with my code.

Debug through looking at the “params” dict and see when it becomes empty?

timatchley commented 1 year ago

It becomes empty on step 0 iteration 3000 in external.py line 205

Here is execution on line 204, values are still there

line 204 to_remove must be getting set to all the entries and then line 205 it removes them all and I'm left with means3D and others having Tensor shapes of 0,3 instead of 1758,3.

After i step over 205

Looks like the logic on this line

big_points_ws = torch.exp(params['log_scales']).max(dim=1).values > 0.1 * variables['scene_radius']

results in True for every single point. And there for it removes every point.
Maybe you can explain what is going on here? My scene_radius is 0.7211402587951079 so I guess whatever threshold that is being tested here is too small resulting in it purging all the data. Can you provide some insight to what is going on here? Maybe it will help me understand what the limitations are on input data?

timatchley commented 1 year ago

colmap2nerf doesn't keep the images in order so the transform matrices aren't mapping to the correct images in the dataset. It also uses a single set of camera intrinsics instead of a set on every frame like @atonalfreerider 's script expected. I've modified my python code to sort the frames by the number of the file in file_name from transforms.json. I've also modified the python script for computing the training metadata to reuse the camera w and h from transforms.json.

Here is the updated python script:

import argparse
import json
import os
import sys
import numpy as np
import re
from typing import List, Dict, Any

class CameraTransform:
    def __init__(self) -> None:
        self.aabb_scale: int = 0
        self.frames: List[Frame] = []

class Frame:
    def __init__(self) -> None:
        self.file_path: str = ""
        self.sharpness: float = 0.0
        self.transform_matrix: List[List[float]] = []
        self.camera_angle_x: float = 0.0
        self.camera_angle_y: float = 0.0
        self.fl_x: float = 0.0
        self.fl_y: float = 0.0
        self.k1: float = 0.0
        self.k2: float = 0.0
        self.k3: float = 0.0
        self.k4: float = 0.0
        self.p1: float = 0.0
        self.p2: float = 0.0
        self.is_fisheye: bool = False
        self.cx: float = 0.0
        self.cy: float = 0.0
        self.w: float = 0.0
        self.h: float = 0.0

class TrainMeta:
    def __init__(self) -> None:
        self.w: float = 0.0
        self.h: float = 0.0
        self.k: List[List[List[List[float]]]] = []
        self.w2c: List[List[List[float]]] = []
        self.fn: List[List[str]] = []
        self.cam_id: List[List[int]] = []

def get_number(frame):
    return int(re.search(r'(\d+).png$', frame["file_path"]).group(1))
def count_files_in_first_directory(path):
    # List all files and directories in the given path
    items = os.listdir(path)

    # Iterate over the items to find the first directory
    for item in items:
        item_path = os.path.join(path, item)
        if os.path.isdir(item_path):
            # If a directory is found, list its contents and count the files
            return len([f for f in os.listdir(item_path) if os.path.isfile(os.path.join(item_path, f))])
    return 0  # Return 0 if no directory is found

def parse(input_path: str, camera_positions: str) -> None:
    transforms_directory = camera_positions
    if str(camera_positions).endswith("transforms.json"):
        transforms_directory = camera_positions[:-len("transforms.json")]
    else:
        camera_positions = os.path.join(camera_positions, "transforms.json")
    with open(camera_positions, 'r') as file:
        camera_transforms = json.load(file)

    ims_path = os.path.join(input_path, "ims")
    cam_count = len([name for name in os.listdir(ims_path) if os.path.isdir(os.path.join(ims_path, name))])
    file_count = count_files_in_first_directory(ims_path)

    train_meta = TrainMeta()
    train_meta.w = int(camera_transforms['w'])
    train_meta.h = int(camera_transforms['h'])
    # ... initialization of other fields ...

    #Need to sort the frames by file_path ending # in numerical order
    sorted_frames = sorted(camera_transforms["frames"], key=get_number)

    for i in range(file_count):
        to_insert = []
        cam_to_insert = []
        k_to_insert = []
        w_to_insert = []
        for j in range(cam_count):
            to_insert.append(f"{j}/{str(i).zfill(3)}.png")
            cam_to_insert.append(j)
            camera_frame = sorted_frames[j]
            k_to_insert_inner = [
                [camera_transforms["fl_x"], 0.0, camera_transforms["cx"]],
                [0.0, camera_transforms["fl_y"], camera_transforms["cy"]],
                [0.0, 0.0, 1.0]
            ]
            k_to_insert.append(k_to_insert_inner)
            w = camera_frame["transform_matrix"]
            w_to_insert.append(w)

        train_meta.fn.append(to_insert)
        train_meta.cam_id.append(cam_to_insert)
        train_meta.k.append(k_to_insert)
        train_meta.w2c.append(w_to_insert)

    with open(os.path.join(transforms_directory, "train_meta.json"), 'w') as file:
        json.dump(train_meta.__dict__, file, indent=4)

    file_path = os.path.join(transforms_directory, "colmap_text", "points3D.txt")
    npz: Dict[str, Any] = {}
    data = parse_colmap_points3D(file_path)
    npz["data"] = data
    np.savez_compressed(os.path.join(input_path, "init_pt_cld.npz"), **npz)

def parse_colmap_points3D(file_path: str) -> np.ndarray:
    with open(file_path, 'r') as f:
        lines = f.readlines()

    # Filter out the lines containing 3D point data
    points_lines = [line.strip() for line in lines if not line.startswith("#")]

    data = np.zeros((len(points_lines), 7))

    for i, line in enumerate(points_lines):
        parts = line.split()

        # point position
        data[i, 0] = float(parts[1])
        data[i, 1] = float(parts[2])
        data[i, 2] = float(parts[3])

        # color
        data[i, 3] = int(parts[4])
        data[i, 4] = int(parts[5])
        data[i, 5] = int(parts[6])

        # seg  - I have no idea what the value should be here! Leaving it as '1' for now
        data[i, 6] = 1

    return data

def main():
    parser = argparse.ArgumentParser(description="Initialize the training data for the dynamic gaussian splatting")
    parser.add_argument("InputPath", help="This is the path to the folder containing the images, and where train_meta.json and init_pt_cld.npz will be written. In the ims folder, each subfolder is a camera")
    parser.add_argument("CameraPositions", help="These camera positions are generated in the Colmap")

    args = parser.parse_args()
    parse(args.InputPath, args.CameraPositions)

    sys.exit(0)

if __name__ == "__main__":
    main()

Additionally I've created a second dataset that has 32 cameras and a very clean colmap sparse reconstruction. So there is still something I believe in the math off here that I'm hoping someone can figure out.

@JonathonLuiten is there any good way to verify my training metadata is valid, or visualize the positions or something of that nature that may allow to see if there is something off with the calculations? Or is there something with the CMU dataset you are using that is specific to that dataset which you had to account for when creating your Camera objects in setup_camera from helpers.py ?

The PSNR starts negative and works its way to around 11 by the time it gets to 10k steps on my new dataset which is still quite far behind the 20's the provided datasets get.

timatchley commented 1 year ago

Just to be clear, it is still not working for me. I believe there is something small off with the metadata/poses. Because when i train it ends up removing all the points after enough iterations. To the point the output data is blank. So my hunch is the what it is seeing from the images vs the numbers it's getting fed in for the positional information are off. I imagine it's something small which is why I'm asking if there is a good way we can visualize anything here from the camera positions to confirm if its right or wrong. =). As looking at numbers doesn't tell me much.

There maybe is a difference in how colmap2nerf.py calculates some of the values vs what the expected values are. That is what i'd like someone to let me know what is wrong if they can figure it out. Or give me another script that calculates the data given a OPENCV colmap sparse configuration.

Any help is welcome!

ch1998 commented 1 year ago

Maybe you should adjust the parameters of colmap in colmap2nerf.py? I tried to use colmap2nerf.py to generate train_meta.json, all the K matrices are the same, I think this is wrong, I am using the data from juggle in the project.

timatchley commented 1 year ago

I'm aware they all come out the same but isn't K the camera intrinsics? As long as the cameras are identical hardware/settings shouldn't they be the same? The colmap2nerf script uses the single camera setting hence why there is only one set of camera intrinsics in their result.

Edit: I took a closer look at colmap2nerf.py and it looks like the values cx, cy, fl_x, fl_y that we are using to build the K matrices come directly from COLMAP. So this would imply Colmap is not calculating these values correctly? I think it's more likely that it's the camera extrinsics (transformational matrix) that colmap2nerf.py is doing something that isn't compatible with what Dynamic3DGaussians is expecting.

ch1998 commented 1 year ago

I'm aware they all come out the same but isn't K the camera intrinsics? As long as the cameras are identical hardware/settings shouldn't they be the same? The colmap2nerf script uses the single camera setting hence why there is only one set of camera intrinsics in their result.

Edit: I took a closer look at colmap2nerf.py and it looks like the values cx, cy, fl_x, fl_y that we are using to build the K matrices come directly from COLMAP. So this would imply Colmap is not calculating these values correctly? I think it's more likely that it's the camera extrinsics (transformational matrix) that colmap2nerf.py is doing something that isn't compatible with what Dynamic3DGaussians is expecting.

I'm aware they all come out the same but isn't K the camera intrinsics? As long as the cameras are identical hardware/settings shouldn't they be the same? The colmap2nerf script uses the single camera setting hence why there is only one set of camera intrinsics in their result.

Edit: I took a closer look at colmap2nerf.py and it looks like the values cx, cy, fl_x, fl_y that we are using to build the K matrices come directly from COLMAP. So this would imply Colmap is not calculating these values correctly? I think it's more likely that it's the camera extrinsics (transformational matrix) that colmap2nerf.py is doing something that isn't compatible with what Dynamic3DGaussians is expecting.

You are right. If the same camera is used, the K matrix is indeed the same. I modified colmap2nerf.py so that colmap re-predicts camera intrinsics for each camera, but Dynamic3DGaussians is still not compatible.

maurock commented 1 year ago

Hi everyone, I'll join the conversation as I'm having very similar issues to the ones you mentioned. I reverse-engineered the input data and was able to extract and format data from a) COLMAP and b) ground truth camera poses and points (Blender). I am only working with frame 0, as I am currently interested in setting a pipeline for the static version.

Regarding data extracted from COLMAP, I got a poor reconstruction, but this was expected as the object is quite challenging to reconstruct. However, I got white images for data where I know the ground truth, which I did not expect. I investigated a bit and noticed an issue in this line from train.py: im, radius, _, = Renderer(....). All the radii are set to 0 for some reason. Given that the w2c and intrinsics should be correct, as I extracted them directly from Blender, do you have any intuition on why this is the case? Does anybody face a similar issue?

Thanks

maurock commented 1 year ago

Follow up on my previous comment - I have noticed a few things that might be helpful: 1) First and foremost, setting the segmentation loss = 0 improved the training procedure significantly. This alone solved many of the issues I faced during training. You can find this in train.py -> get_loss() -> losses['seg'] = ... (I am doing static reconstruction currently, I don't know to what extend dynamic reconstruction could be affected by this). 1) I found the training procedure to be very sensitive to the initial hyperparameters 2) densify() removes, clones, and splits gaussians according to multiple heuristics. By tuning those heuristics, I could achieve better performance.

Hope this helps!

henrypearce4D commented 1 year ago

@maurock very cool to hear you are getting somewhere with your own datasets.

Could you go into detail a little as to what you did to reverse-engineer the input data to the correct format? Can you provide any scripts to help with this task?

ch1998 commented 1 year ago

Hi everyone, I'll join the conversation as I'm having very similar issues to the ones you mentioned. I reverse-engineered the input data and was able to extract and format data from a) COLMAP and b) ground truth camera poses and points (Blender). I am only working with frame 0, as I am currently interested in setting a pipeline for the static version.

Regarding data extracted from COLMAP, I got a poor reconstruction, but this was expected as the object is quite challenging to reconstruct. However, I got white images for data where I know the ground truth, which I did not expect. I investigated a bit and noticed an issue in this line from train.py: im, radius, _, = Renderer(....). All the radii are set to 0 for some reason. Given that the w2c and intrinsics should be correct, as I extracted them directly from Blender, do you have any intuition on why this is the case? Does anybody face a similar issue?

Thanks

Hi，Are the camera parameters of your own datasets obtained through colmap? What does b)ground truth camera poses and points (Blender) include? Extrinsics matrix and initial point cloud? So did you only use the Intrinsics matrix part of the data obtained by colmap?

maurock commented 1 year ago

The code I am using to go from Blender to 3DGS is very much integrated with my pipeline and scattered around currently, but I will clean it and share a few scripts to extract data from Blender whenever I can. In the meantime, what I did was:

Extracted data from Blender. This means I have a few scripts to generate cameras in Blender, get their intrinsics/extrinsics matrices and points on the surface of the objects using the bpy package. I then have additional scripts to format this data as required by this repository (I basically create the init_pt_cld.npz, ims, seg folders, and train_meta.json, plus I make sure to have consistent camera configurations). With careful tuning of the learning rates + heuristics, I was able to correctly reconstruct the Blender objects.
I have an additional pipeline to extract camera poses and a pointcloud using COLMAP. I borrowed parts of the code from the original 3D Gaussian Splatting repo to do that. the data is in the correct format, but results are not good yet - I suspect it has to do with hypertuning.

ch1998 commented 1 year ago

The code I am using to go from Blender to 3DGS is very much integrated with my pipeline and scattered around currently, but I will clean it and share a few scripts to extract data from Blender whenever I can. In the meantime, what I did was:

Extracted data from Blender. This means I have a few scripts to generate cameras in Blender, get their intrinsics/extrinsics matrices and points on the surface of the objects using the bpy package. I then have additional scripts to format this data as required by this repository (I basically create the init_pt_cld.npz, ims, seg folders, and train_meta.json, plus I make sure to have consistent camera configurations). With careful tuning of the learning rates + heuristics, I was able to correctly reconstruct the Blender objects.

I have an additional pipeline to extract camera poses and a pointcloud using COLMAP. I borrowed parts of the code from the original 3D Gaussian Splatting repo to do that. the data is in the correct format, but results are not good yet - I suspect it has to do with hypertuning.

Thank you very much for your reply. Based on your reply, I'm getting good results on my own dataset, but one thing is weird, as the time step increases, the PSNR value keeps decreasing and I'm still looking for the problem.

maurock commented 1 year ago

@ch1998 Great to hear you are seeing positive results! Did you follow my suggestions here? In that case, it looks like the segmentation loss shouldn't be set to 0 for dynamic reconstruction. You could try to set the segmentation loss equal to 0 for the static reconstruction (the first 10000 iterations in your case, which works well) and revert it to the value the authors chose for the following iterations. I think proper segmentation images are needed for good dynamic results. If you don't currently have those images, a workaround could be to set the seg values to 1 (=dynamic) for every point in your initial point cloud - according to this answer. The segmentation images should probably reflect this - just a plain white image should do. In case you try, please let us know how it goes!

ch1998 commented 1 year ago

@maurock I follow on your suggestions, set segmentation loss=0 and obtained good reconstruction results when t=0, 1, and 2, but as t increases, the results will become bad. Segmentation loss=0 maybe only apply to static reconstruction. Importantly, the initial point cloud is important. I use the dense point cloud reconstructed by colmap.

henrypearce4D commented 1 year ago

If either @maurock or @ch1998 have the time to share their data preparation scripts it would be much appreciated!

maurock commented 1 year ago

If either @maurock or @ch1998 have the time to share their data preparation scripts it would be much appreciated!

@henrypearce4D Sure, I am working on it, I'll share it later today!

ch1998 commented 1 year ago

If either @maurock or @ch1998 have the time to share their data preparation scripts it would be much appreciated!

I am using the code shared by @timatchley , using colmap for registration. You need to use colmap's dense reconstruction to get the init point cloud. Then, it is adjusted according to maurock's parameters. In fact currently on my data only the first few frames have good quality. I'm still trying to figure it out.

I think the problem is not with the camera parameters obtained by colmap, that is accurate. Maybe the hyperparameters need to be adjusted.

maurock commented 1 year ago

@henrypearce4D I have added the scripts and instructions to the touch3DGS branch in my fork: https://github.com/maurock/Dynamic3DGaussians/tree/touch3DGS

Here's the link relevant section of the README.md file.
You'll find the scripts you need in data_making/blender_script.py and data_making/blender_to_data.py.

I hope this helps!

timatchley commented 1 year ago

I'm trying it out with full dense reconstruction and 'seg' loss set to 0.0. The first frame is off to a good start it appears. PSNR 32

timatchley commented 1 year ago

JonathonLuiten / Dynamic3DGaussians

Instructions for creating our own dataset to test with #18