Problems while training on Shapenet

kirbiyik commented 3 years ago

Thanks for the detailed code @ErlerPhilipp. I'm trying to train the vanilla network on Shapenet. I have encountered a lot of problems. Could you comment on them? Also please let me know if you have further hints and points that I should be careful.

I put meshes into 00_base_meshes and run the make_dataset.py. Following problems occurL

Names of some meshes changes which then throws error on training. For instance out of 6778 samples I have 6678 in 04_pts and 6767 in 05_query_pts. Some samples have different filenames, ignoring file extensions. Where did they come from? I could only keep common files in trainset.txt at the cost of throwing a lot of samples away but I want to know why this is happening. Maybe something to do with Blensor?
I see that this work requires meshes to be watertight, but trimesh.fill_holes() can't fill the samples so I am using https://github.com/hjwdzh/Manifold. Do you think output of this repo works with this implementation? Can you also comment on why your work assumes watertight meshes?
I've waited make_dataset.py for 10 hours then realized it's not utilizing CPU anymore. I've checked 05_query_pts it had same number of samples with 00_meshes, then I've manually produced txt files. Any idea why this is happening? For 6K samples, I get 118255 files in 04_pcd. Is this number too big? Should I change any parameters like grid_resolution, num_scans_per_mesh_max?

ErlerPhilipp commented 3 years ago

Hi @kirbiyik

Names of the meshes should not change. Can you give an example? Is this perhaps an uppercase-lowercase issue? You can try to put some debug prints in one of the blensor scripts and run it manually on a single mesh with [blensor_bin] [blender_path] -P [script_path], omitting the '-b' parameter. You may also need to remove the last line bpy.ops.wm.quit_blender() of the script. It might also be OS-related, e.g. if you have very long paths.
Calculating the ground-truth SDF requires a very clean input mesh since it's essentially a point-in-polygon test. Shapenet meshes are very messy, which is the reason why I used the ABC dataset. You can either try to fix the issues, switch the dataset or use a more robust SDF calculation. Points2Surf should work with any clean mesh that trimesh can load (see https://trimsh.org/trimesh.html#trimesh.available_formats). Not that this is only important for the training set, not for the reconstruction.
It took my PC around 8h for the ~6k meshes of the paper's training set. The whole multi-processing is not very cleanly implemented. If some workers run into exceptions (e.g. out-of-memory if you have meshes with many faces) or so they might block the main process. You can simply run make_dataset.py again. If the settings are correct, it should have moved broken input files to 'datasets/[dataset]/broken' so they will be ignored in the next run. Don't know how this reacts to the naming issue. 118255 files in 04_pcd sounds ok. It's a problem for certain file managers but not for Python. If you want to reduce the number, you can reduce num_scans_per_mesh_max.

kirbiyik commented 3 years ago

Thanks for the tips, I'll look into the renaming problem later on. Meanwhile let's keep this issue closed.

ErlerPhilipp / points2surf

Problems while training on Shapenet #11