Open jayaramreddy10 opened 1 year ago
I'm looking at this right now as well and raised a similar question here. Response indicated that the point cloud is very specific to CMU Panoptic data: https://github.com/JonathonLuiten/Dynamic3DGaussians/issues/13
However, I'm looking at the same methods and the N x 7 numpy data array and trying to reverse engineer them. I have been able to reconstruct the train_meta.json file for my own videos
To your observation, in the init_pt_cld.npz it looks like column 6 is segmentation data (all values are 1
afaik), column 0-2 are mean 3D points (x,y,z). Columns 3-5 are rgb colors
def initialize_params(seq, md):
init_pt_cld = np.load(f"./data/{seq}/init_pt_cld.npz")["data"]
seg = init_pt_cld[:, 6]
max_cams = 50
sq_dist, _ = o3d_knn(init_pt_cld[:, :3], 3)
mean3_sq_dist = sq_dist.mean(-1).clip(min=0.0000001)
params = {
'means3D': init_pt_cld[:, :3],
'rgb_colors': init_pt_cld[:, 3:6],
...
#Open3D K-nearest neighbors
def o3d_knn(pts, num_knn):
indices = []
sq_dists = []
pcd = o3d.geometry.PointCloud()
pcd.points = o3d.utility.Vector3dVector(np.ascontiguousarray(pts, np.float64))
pcd_tree = o3d.geometry.KDTreeFlann(pcd)
for p in pcd.points:
[_, i, d] = pcd_tree.search_knn_vector_3d(p, num_knn + 1)
indices.append(i[1:])
sq_dists.append(d[1:])
return np.array(sq_dists), np.array(indices)
I'm looking at this right now as well and raised a similar question here. Response indicated that the point cloud is very specific to CMU Panoptic data: #13
However, I'm looking at the same methods and the N x 7 numpy data array and trying to reverse engineer them. I have been able to reconstruct the train_meta.json file for my own videos
To your observation, in the init_pt_cld.npz it looks like column 6 is segmentation data (all values are
1
afaik), column 0-2 are mean 3D points (x,y,z). Columns 3-5 are rgb colorsdef initialize_params(seq, md): init_pt_cld = np.load(f"./data/{seq}/init_pt_cld.npz")["data"] seg = init_pt_cld[:, 6] max_cams = 50 sq_dist, _ = o3d_knn(init_pt_cld[:, :3], 3) mean3_sq_dist = sq_dist.mean(-1).clip(min=0.0000001) params = { 'means3D': init_pt_cld[:, :3], 'rgb_colors': init_pt_cld[:, 3:6], ... #Open3D K-nearest neighbors def o3d_knn(pts, num_knn): indices = [] sq_dists = [] pcd = o3d.geometry.PointCloud() pcd.points = o3d.utility.Vector3dVector(np.ascontiguousarray(pts, np.float64)) pcd_tree = o3d.geometry.KDTreeFlann(pcd) for p in pcd.points: [_, i, d] = pcd_tree.search_knn_vector_3d(p, num_knn + 1) indices.append(i[1:]) sq_dists.append(d[1:]) return np.array(sq_dists), np.array(indices)
Hi, I've encountered the same issue as well. Could you please let me know how you were able to reconstruct the train_meta.json file for your own videos? Thanks.
Usually I would use Colmap, but I am working with only two videos, and Colmap hasn't been able to solve this. So instead I placed the cameras manually in the Unity editor and exported the camera transforms. Then I ran this C# script:
using System.CommandLine;
using System.CommandLine.NamingConventionBinder;
using System.IO.Compression;
using Newtonsoft.Json;
using NumSharp;
static class Program
{
class Args
{
public string InputPath { get; set; }
public string CameraPositions { get; set; }
}
static void Main(string[] args)
{
RootCommand rootCommand = new()
{
new Argument<string>(
"InputPath",
"This is the path to the folder containing the images, and where train_meta.json and init_pt_cld.npz will be written. In the ims folder, each subfolder is a camera"),
new Argument<string>(
"CameraPositions",
"These camera positions are generated in the Colmap")
};
rootCommand.Description = "Initialize the training data for the dynamic gaussian splatting";
// Note that the parameters of the handler method are matched according to the names of the options
rootCommand.Handler = CommandHandler.Create<Args>(Parse);
rootCommand.Invoke(args);
Environment.Exit(0);
}
[Serializable]
public class CameraTransform
{
public int aabb_scale;
public List<Frame> frames;
}
[Serializable]
public class Frame
{
public string file_path;
public float sharpness;
public float[][] transform_matrix;
public float camera_angle_x;
public float camera_angle_y;
public float fl_x;
public float fl_y;
public float k1;
public float k2;
public float k3;
public float k4;
public float p1;
public float p2;
public bool is_fisheye;
public float cx;
public float cy;
public float w;
public float h;
}
[Serializable]
public class train_meta
{
public float w;
public float h;
public List<List<List<float[]>>> k;
public List<List<float[][]>> w2c;
public List<List<string>> fn;
public List<List<int>> cam_id;
}
static void Parse(Args args)
{
CameraTransform cameraTransforms = JsonConvert
.DeserializeObject<CameraTransform>(File.ReadAllText(args.CameraPositions))!;
string imsPath = Path.Combine(args.InputPath, "ims");
int camCount = Directory.EnumerateDirectories(imsPath).Count();
int fileCount = Directory.EnumerateFiles(Directory.EnumerateDirectories(imsPath).ToList()[0]).Count();
train_meta trainMeta = new()
{
w = 640,
h = 360,
fn = new(),
cam_id = new(),
k = new(),
w2c = new()
};
for (int i = 0; i < fileCount; i++)
{
List<string> toInsert = new();
List<int> camToInsert = new();
List<List<float[]>> kToInsert = new();
List<float[][]> wToInsert = new();
for(int j= 0; j < camCount; j++)
{
toInsert.Add($"{j}/{i:D3}.jpg");
camToInsert.Add(j);
Frame cameraFrame = cameraTransforms.frames[j];
List<float[]> kToInsertInner = new()
{
new[]{cameraFrame.fl_x, 0f, cameraFrame.cx},
new[]{0f, cameraFrame.fl_y, cameraFrame.cy},
new[]{0f, 0f, 1f}
};
kToInsert.Add(kToInsertInner);
float[][] w = cameraFrame.transform_matrix;
wToInsert.Add(w);
}
trainMeta.fn.Add(toInsert);
trainMeta.cam_id.Add(camToInsert);
trainMeta.k.Add(kToInsert);
trainMeta.w2c.Add(wToInsert);
}
File.WriteAllText(Path.Combine(args.InputPath, "train_meta.json"), JsonConvert.SerializeObject(trainMeta, Formatting.Indented));
// TODO create point cloud
Dictionary<string, Array> npz = new();
int pointCount = 0; // TODO number of points from Colmap
double[,] data = new double[pointCount, 7];
for (int i = 0; i < pointCount; i++)
{
// point position
data[i, 0] = 0;
data[i, 1] = 0;
data[i, 2] = 0;
// color
data[i, 3] = 0;
data[i, 4] = 0;
data[i, 5] = 0;
//seg
data[i, 6] = 1;
}
npz.Add("data.npz", data);
np.Save_Npz(npz, Path.Combine(args.InputPath, "init_pt_cld.npz"), CompressionLevel.NoCompression);
}
}
a point cloud from colmap should be fine... I was getting it from the available depth cameras.
Would recommend setting the seg value on the point cloud to all 1.
Unless you know some points are 100% static, then you can specifically set them to 0 to fix them, but this is not necessary.
Hi, congrats on the great work. I have a query regarding initial point cloud estimate which the code expects, its being read from this file (init_pt_cld.npz) and has the shape (N, 7).
@JonathonLuiten , I have 2 questions regarding this. 1.) Could you provide any insights/suggestions on how you are constructing this from the posed images? Would colmap suffice? 2.) Especially Last col which has 'seg' label (binary), does this indicate foreground/background? Looking forward for your response.