Closed jimmyfyx closed 1 year ago
Hi,
PoseData.json
contains pose information of all the objects in the scene (including camera). This is nowhere used in our code. We released other additional data with our dataset such as PoseData.json
, surface normals, optical flow, etc in case somebody wants to use our dataset for sdome other purpose. If you want to run our model on a new dataset, you don't need this.TransformationMatrices.csv
correspond to the first row. You can verify this by checking the last 4 element of TransformationMatrices.csv
and 3 elements of CameraIntrinsicMatrix.json
. They should be 0, 0, 0, 1
and 0, 0, 1
respectively.InterpolationData.json
contains how the object motions were interpolated between keyframes. Again, not needed to train our model..npy
files. You can either additionally save the images as npy
files or modify the data_loader to load png images and crop them.Hope my answers resolved your queries. Please let me know if you have any follow-up questions.
PS: I appreciate the detailed questions and the work you've done before asking the questions
Thank you! A quick follow-up question on the point 4. Then what are the .exr
files used for in the VeedDynamic dataset?
By the way, is there a recommendation of how many video sequences should be used for training? And is there an estimation of how long the model takes to train?
Need for .exr
files: We have three data-loaders: one each to train the flow estimation and infilling networks and one for final testing. The first two use .npy
files, while the latter uses .png
and .exr
files. You ca easily modify either of them to use a consistent format. The reason for loading .png
and .exr
files instead of .npy
files for the tester data-loader is to allow testing on any given video without having to convert them to .npy
first.
How many videos? We use the pre-trained ARflow model and only finetune it to estimate flow betwen MPIs. This is a relatively easy task. So, about 500 full HD frames should be sufficient to learn it well. However, the inpainting network may need much more data. We didn't experiment more with the inpainting network. It's not clear how many is sufficient. On our dataset, it learnt well. So, I think 1500 full HD frames should be enough.
Training time: Both the models took about a day to train on our GPU. I've forgotten the details of the GPU, but it was 2-3 times slower than NVIDIA RTX 2080.
OK thanks!
Closing this issue. Please reopen it if required.
Hi,
I wanna ask a few questions about training and testing dataset format as now I'm trying to use my own dataset. Suppose I just want to use the dataloader for VeedDynamic dataset, what should the dataset format look like? Specifically:
PoseData.json
, and for each sequence of a video there is also aPoseData.json
. I wonder what is the difference between these two? Also, how can I interpret the json file? For example, what areBase
,Base.001
, and so on?TransformationMatrices.csv
andCameraIntrinsicMatrix.json
, how are matrices represented? For example, is a matrix flattened by row or by column?InterpolationData.json
?src/data_loaders/VeedDynamic01.py
it seems that I only need to provide RGB image.png
, depth information.exr
, and transformation matrices.csv
to train the model? But insrc/flow_estimation/data_loaders/VeedDyanmic01.py
I see we also need RGB and depth image in.npy
format?get_frame_resolution
function the dataloader, but I'm not sure does that mean it can read frames with any resolution?Thanks so much for the reply!