VCL3D / StructureNet

Markerless volumetric alignment for depth sensors. Contains the code of the work "Deep Soft Procrustes for Markerless Volumetric Sensor Alignment" (IEEE VR 2020).
https://vcl3d.github.io/StructureNet/
43 stars 9 forks source link

Training Information #9

Closed ababilinski closed 2 years ago

ababilinski commented 2 years ago

What do you mean by backgrounds for training data. Can I get more information about training the model? I want to train it on smaller boxes so that the cameras can be closer to the subject when using Volumetric Capture

alexd314 commented 2 years ago

Hello,

I am afraid this will not be a very easy process, as currently we have not documented any of the details, and this would require efforts going through the source code in order to understand what is going on.

Here, I will only try to give you a rough outline on just one-two of the most important aspects. However, keep in mind that there will be certainly other parts that would require additional attention.

Just to explain regarding the background augmentation, in order to be able to use the model in real-world cases, during training time we synthetically "blend" the 3D replica of the structure with artificial backgrounds taken from public and inhouse RGB-D datasets (where we only use the depthmaps, while discarding color). In the main.py, which constitutes the training script, the command line arguments --vcl_path, --intnet_path and --corbs_path point to local paths of 3 of such datasets. We currently have not released the datasets we have used. However, in principle, any depth-dataset, either public, or captured in-house, would do. A folder containing depthmap files would suffice. Note that the datasets do not need to be annotated in any way. We just want real-world depth captures of real-world scenes. That's all. For reference implementation on how we load our background datasets check here. An important remark here is the scale parameter of the ImageBackgroundSamplerParams instance, which refers to applying additional scaling when loading the depth map. (i.e to account for when the 3D model of the structure is in meters but you depth maps are in millimeters)

Most importantly, apart from backgrounds needed for augmentation, in order to retrain the model, you need a 3D replica of the structure authored in a 3D authoring tool, like Blender, Maya, 3DSMax or any other equivalent. Actually how you build the 3D replica is not of a concern, as soon as it is exported in .obj format, potentially having annotations (obj "o" element) regarding the label of each box's side. If your structure is comprised of simple boxes like ours, the main thing to take care of is being precise with the length measurements of the boxes' sides and their relative placement. We believe this is easier accomplished in a 3D authoring tool as mentioned before, but If you don't want to use an authoring tool, and have the box's lengths, you could even try to script it with an .obj exporter in any programming language you like.

In theory, if the structure you use is comprised of the same number of boxes as ours (4 in this case), just replacing the .obj file in data/asymmetric_box.obj would probably do. If you are customizing beyond our "template" structure, you need to deal with stuff in src/io/box_model_loader.py and src/dataset/rendering/box_renderer.py to make sure everything is compatible with your model.

I know that this is only a fraction of the information that is required towards your goal, but a complete documentation is not available at the moment. Apart from the code itself, the best documentation we have regarding the methodology that we follow is our publication. While we cannot guarantee support under all circumstances, if you find yourself struggling with a specific issue, feel free to leave another comment or open another issue. We will try to assist in any case. Good luck !

ababilinski commented 2 years ago

Hi @alexd314 , Thank you so much for that information 🙂 It definitely puts me on the right track.

Just so I better understand:

  • Does the data have to be arranged in any particular way or just a folder 1-100 frames of RGBD depth frames from different perspectives?
  • Does the data structure differ between --vcl_path, --intnet_path and --corbs_path or can they be any RGBD depth data as long as they are unique datasets?

Thank you for your time,

alexd314 commented 2 years ago

Hi again,

(1) No structure is needed for the files. Just a folder under which all your *.png, *.pgm, (or other fileformat - see below) are located, is sufficient. Please use only depthmaps. No color files should exist inside the folder. (2) The data-structure across datasets is exactly the same, as described in (1). The code treats datasets with "per-dataset" hyperparameters, giving slightly more importance to some datasets over the others. (For a detailed understanding look here, here, here, here and the context around those links). Simplifying things, in theory, you could even have a single folder with your depthmaps and have all 3 dataset paths point to the same folder. This would also work.

The code for loading the depth data is exactly here. In principle, any depth image format readable by opencv, with flag ANY_DEPTH is compatible (i.e. single channel png, pgm or other).

I hope this helps.