Closed rakshith95 closed 2 years ago
Sorry for some reason I don't get alerts on this repo. Sure yeah, you can refer to
https://github.com/alexklwong/void-dataset/issues/5
to read from the raw dataset. Let me know if you have more questions.
Hello @alexklwong , thanks for getting back. I'm having some trouble building xivo to generate the sparse points for my captured videos, but i have a few questions regarding the generation of the data used:
visionlab0
). The D435i outputs the images at a rate of 30fps for the 640x480 resolution if I'm not wrong, so I assume you use only a subset and not every frame. How is this determined? Do you simply sample, say 1 out of every n images, or do you choose some 'keyframes'?
If it's the latter, what is it based on?Right,
I think the rosbag should have all the frames. The dataset, however, is a result of CORVIS (alpha version of XIVO) and has a subset of the frames. How the frames were filtered were based on the sufficient parallax from the previous frame. Suppose that we are at time t, we skip any frame after t that has less than 1 cm of translation until we have a frame at t + \tau that has a translation of at least 1 cm from frame at time t.
Yes that's correct, computing 1500 key points is sufficient for each frame to simulate a similar set up. The 1500 points contain both inliers and outliers (the inlier set really is around 60 - 150 points, this is the void150 dataset). The key points do not need to be tracked across frames for calibrated-backprojection-network to work since it only takes a synchronized image, sparse depth map, and calibration as input. During training we also don't assume that the points are tracked (in reality yes inliers do appear across frames).
XIVO uses a corner based detector, so if you want to simulate it you can use: https://github.com/alexklwong/learning-topology-synthetic-data/blob/master/setup/setup_dataset_scenenet.py#L72 as an example, or even: https://github.com/alexklwong/learning-topology-synthetic-data/blob/master/setup/setup_dataset_scenenet.py#L99
Thanks a lot for the information! Closing the issue now
Hello @alexklwong , in this , should line 35, N_INIT_CORNER = 15000
be 1500 instead of 15000?
Oh no, that's correct should be 15000. This is the number of initial corner points to be detected. This is because we noticed that Harris tends to detect large clusters of points around a location, so you may get ~100 points near a single corner. This might be because we didn't tune Harris since it had to be ran for a large number of scenes. So our fix was for it to detect more points for k-means, which will then optimize for the 1500 means. https://github.com/alexklwong/learning-topology-synthetic-data/blob/master/setup/setup_dataset_scenenet.py#L85
Oh I see, thank you
Hello, Can you share the codes used to convert from the raw bag file / however the sequences were captured into the format you currently use for the dataset, i.e. groundtruth, sparse, image, validity_map, K.
I would like to use data I've captured in addition to the ones you've already provided, so it would be super helpful if you could share those.
Thanks!