ErlerPhilipp / points2surf

Learning Implicit Surfaces from Point Clouds (ECCV 2020)
https://www.cg.tuwien.ac.at/research/publications/2020/erler-2020-p2s/
MIT License
454 stars 48 forks source link

fine-tuning #35

Closed seliarah closed 1 month ago

seliarah commented 3 months ago

I used your pretrained models on my own dataset and got good reconstruction results, although the model had not seen my objects, but when I train the model on it the results are not good at all. Any ideas for the reason? Does number of sampled point clouds or query points affect this? Also how can I fine-tune the model on my dataset?

ErlerPhilipp commented 3 months ago

I never tried to fine-tune this network, so I can only guess.

Can you give more details on how you trained it on your own data? How many objects? How many query points?

My best guess is that data augmentation (random rotations) is enabled. So the network learned something about your specific objects but not very accurately.

I would assume that it works better if you fine-tune with a combined training set for a few epochs. E.g. P2S training set + 100 Objects like the one you're trying to reconstruct.

In any case, you should try our new work, PPSurf. It's much better in every regard, especially training time.

seliarah commented 3 months ago

I didn't understand your point about data augmentation. you mean in your pretrained model? I didn't change anything in that, just used it to reconstruct my own dataset

Also I have a more general question, you're training dataset is the ABC dataset right? if the model is trained on this, how is able to reconstruct objects from other datasets such as famous or real-world data that are quite different and the model has not seen those? the same thing I believe applies here that the model had not seen my objects before but gave me good results

and about training my own data, I have around 250 objects of same type, however they are pretty much the same with only details changed (i'm looking to train the model on this specific type of object only), and number of query points for each object ranges from 2000 - 5000. But it is pretty clear from the results the model was not trained on my dataset accurately, I don't know what the problem is

ErlerPhilipp commented 3 months ago

I didn't understand your point about data augmentation. you mean in your pretrained model? I didn't change anything in that, just used it to reconstruct my own dataset

When you continue training (fine-tuning), the model might quickly overfit on your new dataset. Basically, it might forget all the other examples. Also, data augmentation is enabled by default during training. It might learn your new objects to some degree, but not necessarily at the required rotation.

Also I have a more general question, you're training dataset is the ABC dataset right? if the model is trained on this, how is able to reconstruct objects from other datasets such as famous or real-world data that are quite different and the model has not seen those? the same thing I believe applies here that the model had not seen my objects before but gave me good results

Yes, ABC var-noise for training. The patch-based approach enables the network to generalize to any kind of rigid body. The global encoding is rather weak, though. This leads to a noisy reconstruction because some voxels near the surface might have the wrong inside/outside.

and about training my own data, I have around 250 objects of same type, however they are pretty much the same with only details changed (i'm looking to train the model on this specific type of object only), and number of query points for each object ranges from 2000 - 5000. But it is pretty clear from the results the model was not trained on my dataset accurately, I don't know what the problem is

  1. 250 objects might not be enough, even though it's from just one type. I think the ~5000 objects in ABC var-noise are rather a minimum but it should cover all possible types to some degree. As a reference, ShapeNetCore has ~50k objects in 55 classes and ModelNet has ~10k objects in 40 classes.
  2. 2000 - 5000 query points is strange. This number should be the same for every object. Are you confusing this with the point cloud size? Query points are essentially the number of training samples per object.
  3. The quality of P2S depends a lot on the type of objects. Since it learns an SDF, it can only work properly with a small number of solid bodies per scene. If you have e.g. many overlapping layers or thin sheets of stuff, it won't produce nice results. Typical indoor and outdoor scenes won't work since there is no clear inside and outside.
seliarah commented 3 months ago

thank you for the thorough explanation, really appreciate it. With all of this in mind, what do you suggest me to do? Training the model from scratch with a dataset of mine containing more objects, or use your pre-trained model and fine-tune it on my dataset?

regarding the query points, I changed the code so that query points are a percentage of the whole on-surface sampled point clouds. for each object the num of point clouds vary, as well as query points. Does it affect the training process and the results?

ErlerPhilipp commented 3 months ago

thank you for the thorough explanation, really appreciate it. With all of this in mind, what do you suggest me to do? Training the model from scratch with a dataset of mine containing more objects, or use your pre-trained model and fine-tune it on my dataset?

that depends a lot on what your goal is. if you're happy with the quality, you shouldn't waste time on fine-tuning. P2S, PPSurf and similar methods are meant to be used just as they are. if you need better quality, you rather switch methods before optimizing one.

  1. as said before, PPSurf is much better than P2S in every regard.
  2. if you have a lot of compute resources available, you probably can get the best results from some NeuS-derivative, e.g. Neuralangelo or something in NerFStudio/SDFStudio.
  3. If you need something very fast, you could try Shape-as-points or some SLAM variant.
  4. if you need something for academic research, we could maybe start a collaboration.

regarding the query points, I changed the code so that query points are a percentage of the whole on-surface sampled point clouds. for each object the num of point clouds vary, as well as query points. Does it affect the training process and the results?

i think that's the main problem. on-surface sampled points are NOT sufficient for query points. the network must learn the SDF far away from the surface, too. that's why I took 50% query points NEAR (not just on) the surface and 50% uniformly sampled random point from the unit cube.

seliarah commented 3 months ago

wow, thanks a lot for your response. I really appreciate your help. Actually, I'm looking for a method that can learn almost the "perfect" implicit function from point clouds for very simple geometric shapes, such as cylinders or trapezoids, so that while reconstructing them, the mesh would be almost as the same as the GT mesh. I observed that in come cases point2surf fails to give the perfect mesh for objects like a cylinder. So I'd really appreciate it if you give me any suggestions or ideas that you might have for this

The SDFStudio you mentioned I believe is not for the input point clouds right? And I haven't tried your new work yet

ErlerPhilipp commented 3 months ago

ok, so very accurate reconstructions for simple shapes. do you also have rather clean (little noise, no missing areas, similar sampling density) point clouds?

  1. if no, then you could try something that fits parametric surfaces or reconstructs a CAD model. if you really need it, you can convert the CAD model to an SDF. there is e.g. Point2CAD and NeurCADRecon. i didn't try these but they seem solid. PPSurf will also work nicely with this.
  2. if yes, i'd recommend Neural-IMLS. we found this during our comparisons for the PPSurf paper. it's very accurate, but doesn't like noisy point clouds.

Yes, SDFStudio, NeRFs, Gaussian Splatting and similar ones take photos as inputs, not point clouds.