JonathonLuiten / Dynamic3DGaussians

Other
1.93k stars 117 forks source link

How to get initial point cloud estimate (init_pt_cld.npz) #17

Open jayaramreddy10 opened 11 months ago

jayaramreddy10 commented 11 months ago

Hi, congrats on the great work. I have a query regarding initial point cloud estimate which the code expects, its being read from this file (init_pt_cld.npz) and has the shape (N, 7).

@JonathonLuiten , I have 2 questions regarding this. 1.) Could you provide any insights/suggestions on how you are constructing this from the posed images? Would colmap suffice? 2.) Especially Last col which has 'seg' label (binary), does this indicate foreground/background? Looking forward for your response.

atonalfreerider commented 11 months ago

I'm looking at this right now as well and raised a similar question here. Response indicated that the point cloud is very specific to CMU Panoptic data: https://github.com/JonathonLuiten/Dynamic3DGaussians/issues/13

However, I'm looking at the same methods and the N x 7 numpy data array and trying to reverse engineer them. I have been able to reconstruct the train_meta.json file for my own videos

To your observation, in the init_pt_cld.npz it looks like column 6 is segmentation data (all values are 1 afaik), column 0-2 are mean 3D points (x,y,z). Columns 3-5 are rgb colors

def initialize_params(seq, md):
    init_pt_cld = np.load(f"./data/{seq}/init_pt_cld.npz")["data"]
    seg = init_pt_cld[:, 6]
    max_cams = 50
    sq_dist, _ = o3d_knn(init_pt_cld[:, :3], 3)
    mean3_sq_dist = sq_dist.mean(-1).clip(min=0.0000001)
    params = {
        'means3D': init_pt_cld[:, :3],
        'rgb_colors': init_pt_cld[:, 3:6],
    ...  

#Open3D K-nearest neighbors
def o3d_knn(pts, num_knn):
    indices = []
    sq_dists = []
    pcd = o3d.geometry.PointCloud()
    pcd.points = o3d.utility.Vector3dVector(np.ascontiguousarray(pts, np.float64))
    pcd_tree = o3d.geometry.KDTreeFlann(pcd)
    for p in pcd.points:
        [_, i, d] = pcd_tree.search_knn_vector_3d(p, num_knn + 1)
        indices.append(i[1:])
        sq_dists.append(d[1:])
    return np.array(sq_dists), np.array(indices)  
PKUVDIG commented 11 months ago

I'm looking at this right now as well and raised a similar question here. Response indicated that the point cloud is very specific to CMU Panoptic data: #13

However, I'm looking at the same methods and the N x 7 numpy data array and trying to reverse engineer them. I have been able to reconstruct the train_meta.json file for my own videos

To your observation, in the init_pt_cld.npz it looks like column 6 is segmentation data (all values are 1 afaik), column 0-2 are mean 3D points (x,y,z). Columns 3-5 are rgb colors

def initialize_params(seq, md):
    init_pt_cld = np.load(f"./data/{seq}/init_pt_cld.npz")["data"]
    seg = init_pt_cld[:, 6]
    max_cams = 50
    sq_dist, _ = o3d_knn(init_pt_cld[:, :3], 3)
    mean3_sq_dist = sq_dist.mean(-1).clip(min=0.0000001)
    params = {
        'means3D': init_pt_cld[:, :3],
        'rgb_colors': init_pt_cld[:, 3:6],
    ...  

#Open3D K-nearest neighbors
def o3d_knn(pts, num_knn):
    indices = []
    sq_dists = []
    pcd = o3d.geometry.PointCloud()
    pcd.points = o3d.utility.Vector3dVector(np.ascontiguousarray(pts, np.float64))
    pcd_tree = o3d.geometry.KDTreeFlann(pcd)
    for p in pcd.points:
        [_, i, d] = pcd_tree.search_knn_vector_3d(p, num_knn + 1)
        indices.append(i[1:])
        sq_dists.append(d[1:])
    return np.array(sq_dists), np.array(indices)  

Hi, I've encountered the same issue as well. Could you please let me know how you were able to reconstruct the train_meta.json file for your own videos? Thanks.

atonalfreerider commented 11 months ago

Usually I would use Colmap, but I am working with only two videos, and Colmap hasn't been able to solve this. So instead I placed the cameras manually in the Unity editor and exported the camera transforms. Then I ran this C# script:


using System.CommandLine;
using System.CommandLine.NamingConventionBinder;
using System.IO.Compression;
using Newtonsoft.Json;
using NumSharp;

static class Program
{
    class Args
    {
        public string InputPath { get; set; }
        public string CameraPositions { get; set; }
    }

    static void Main(string[] args)
    {
        RootCommand rootCommand = new()
        {
            new Argument<string>(
                "InputPath",
                "This is the path to the folder containing the images, and where train_meta.json and init_pt_cld.npz will be written. In the ims folder, each subfolder is a camera"),

            new Argument<string>(
                "CameraPositions", 
                "These camera positions are generated in the Colmap")
        };

        rootCommand.Description = "Initialize the training data for the dynamic gaussian splatting";

        // Note that the parameters of the handler method are matched according to the names of the options 
        rootCommand.Handler = CommandHandler.Create<Args>(Parse);

        rootCommand.Invoke(args);

        Environment.Exit(0);
    }

    [Serializable]
    public class CameraTransform
    {
        public int aabb_scale;
        public List<Frame> frames;
    }

    [Serializable]
    public class Frame
    {
        public string file_path;
        public float sharpness;
        public float[][] transform_matrix;
        public float camera_angle_x;
        public float camera_angle_y;
        public float fl_x;
        public float fl_y;
        public float k1;
        public float k2;
        public float k3;
        public float k4;
        public float p1;
        public float p2;
        public bool is_fisheye;
        public float cx;
        public float cy;
        public float w;
        public float h;
    }

    [Serializable]
    public class train_meta
    {
        public float w;
        public float h;
        public List<List<List<float[]>>> k;
        public List<List<float[][]>> w2c;
        public List<List<string>> fn;
        public List<List<int>> cam_id;
    }

    static void Parse(Args args)
    {
        CameraTransform cameraTransforms = JsonConvert
            .DeserializeObject<CameraTransform>(File.ReadAllText(args.CameraPositions))!;

        string imsPath = Path.Combine(args.InputPath, "ims");
        int camCount = Directory.EnumerateDirectories(imsPath).Count();
        int fileCount = Directory.EnumerateFiles(Directory.EnumerateDirectories(imsPath).ToList()[0]).Count();

        train_meta trainMeta = new()
        {
            w = 640,
            h = 360,
            fn = new(),
            cam_id = new(),
            k = new(),
            w2c = new()
        };

        for (int i = 0; i < fileCount; i++)
        {
            List<string> toInsert = new();
            List<int> camToInsert = new();
            List<List<float[]>> kToInsert = new();
            List<float[][]> wToInsert = new();
            for(int j= 0; j < camCount; j++)
            {
                toInsert.Add($"{j}/{i:D3}.jpg");
                camToInsert.Add(j);
                Frame cameraFrame = cameraTransforms.frames[j];
                List<float[]> kToInsertInner = new()
                {
                    new[]{cameraFrame.fl_x, 0f, cameraFrame.cx},
                    new[]{0f, cameraFrame.fl_y, cameraFrame.cy},
                    new[]{0f, 0f, 1f}
                };
                kToInsert.Add(kToInsertInner);

                float[][] w = cameraFrame.transform_matrix;
                wToInsert.Add(w);
            }
            trainMeta.fn.Add(toInsert);
            trainMeta.cam_id.Add(camToInsert);
            trainMeta.k.Add(kToInsert);
            trainMeta.w2c.Add(wToInsert);
        }

        File.WriteAllText(Path.Combine(args.InputPath, "train_meta.json"), JsonConvert.SerializeObject(trainMeta, Formatting.Indented));

        // TODO create point cloud
        Dictionary<string, Array> npz = new();
        int pointCount = 0; // TODO number of points from Colmap
        double[,] data = new double[pointCount, 7];
        for (int i = 0; i < pointCount; i++)
        {
            // point position
            data[i, 0] = 0;
            data[i, 1] = 0;
            data[i, 2] = 0;

            // color
            data[i, 3] = 0;
            data[i, 4] = 0;
            data[i, 5] = 0;

            //seg
            data[i, 6] = 1;
        }
        npz.Add("data.npz", data);
        np.Save_Npz(npz, Path.Combine(args.InputPath, "init_pt_cld.npz"), CompressionLevel.NoCompression);
    }
}
JonathonLuiten commented 11 months ago

a point cloud from colmap should be fine... I was getting it from the available depth cameras.

Would recommend setting the seg value on the point cloud to all 1.

Unless you know some points are 100% static, then you can specifically set them to 0 to fix them, but this is not necessary.