ahq1993 / MPNet

Motion Planning Networks
MIT License
216 stars 49 forks source link

Bad loss when training CAE #8

Open whiterabbitfollow opened 4 years ago

whiterabbitfollow commented 4 years ago

Hey,

Don't know if this is an issue, but I'm looking into your paper and trying to reproduce your results. However, I'm stuck with the encoder network. I get roughly a meas squared error of approx ~2. I plot the reconstructed data, but I don't think it looks good enough. Did you also experience roughly the same loss or did you match your input data perfectly?

Your'e doing really cool work, keep it up! :+1:

uncobruce commented 4 years ago

Hi Ahmed,

I was training on the simple 2d dataset that you've provided to reproduce the result and was getting a similar loss for CAE as @whiterabbitfollow. Do you know if that is normal? The MPNet is quite an interesting mix of classic and ml path planning technique btw.

MinsungYoon commented 3 years ago

The author simply uses pure MLP and MSE loss in the pretraining (reconstruction) stage for environment latent features from point cloud data. However, when dealing with point cloud data, max-pooling is required to satisfy the Permutation Invariance: Symmetric Function structure when configuring the architecture, and permutation invariant losses such as Chamfer Distance (CD) and Earth Mover's Distance (EMD) must be used. Please refer to http://stanford.edu/~rqi/pointnet/docs/cvpr17_pointnet_slides.pdf https://arxiv.org/pdf/1901.08906.pdf Otherwise, there will be a problem in generalization between environments or just the same point clouds in a different order, but it is questionable that the results are too good even in an unseen environment in the paper.

gabrielpeixoto-cvai commented 1 year ago

@whiterabbitfollow @uncobruce I have the same issue. I am using the same CAE provided in this paper to try to train using the dataset they provide in this repo. Results are really bad, if I plot the input for the encoder and the reconstructed.

I even tried something simpler, just a square in the image, with random translations, and it was better but still not perfect.

@MinsungYoon I agree that pointenet is a better approach. But given that their work should be reproducible, using the same dataset and same network should yield good results, or else, how should they validated their CAE in the first place?

However, I trained both encoder and planner and in the end it seems to be working, i.e. the samples go from the start to the goal position, but the results seem worse than claimed by the paper. I just wonder why the reconstructed representation is so bad.

gabrielpeixoto-cvai commented 1 year ago

@whiterabbitfollow @uncobruce After some testing, I discovered that my code was the culprit. I could correctly encode and decode the environments using the default CAE present in this repo. I could also make it work with a more difficult dataset that I created myself.

Intermediate result using the CAE. Screenshot from 2023-09-03 20-01-17

Results with a custom dataset (training with 10 different environments = overfit) Screenshot from 2023-09-04 14-07-12

Results with 1000 different environments (start generalizing):

Screenshot from 2023-09-04 14-19-51

Results with 10000 different environments (Generalizes): Screenshot from 2023-09-04 14-29-53

You notice a behavior here due to the MSE component in the Contractive loss. The more samples you provide, the more the network tends to concentrate in the center of the shape (the same trend happens for the 2D dataset provided here). This is fine for the dataset presented in this repo (squares that have the same size), but once you start increasing the complexity of the data (my custom dataset as an example), this behavior can be problematic.

I will also experiment with other losses, e.g. Chamfer loss and other architectures, like the pointnet.

oxcarxierra commented 1 month ago

@gabrielpeixoto-cvai Hi, i wonder if you remember about this work since its a year after your last comment, but I think I am struggling with the same point. In my case as well, even though i used the same encoder architecture and loss(which is mse loss + contractive loss) with this repo and trained it, ended up with poor result as the picture. Figure_1 My dataset is generated from 10000 different environments and seems like enough amout. I'm assuming maybe this is because of the permutation dependence where all the pointcloud datas are not sorted. Would you mind if you share some of your works or ideas that used to train CAE? It would be a lot of help to me.Thanks!

gabrielpeixoto-cvai commented 1 month ago

@oxcarxierra Hello, yes I remember, about this. I am still actively researching in this area. After my comment, I executed experiments with other loss functions, like earth-movers distance and chamfer losses. The reason is related to what you said in your comment: point clouds are permutation-independent. Earth-movers distance and chamfer can handle this scenario better.

Additionally, I experimented with other networks for encoding environments, such as POintNet and PointNet ++, and obtained far better results. However, loss functions and new architectures are far heavier than the MPNet CAE and require more time to train, as well as GPU memory and data.

If you want to understand the limits of the MPNET CAE, I will list some steps that I did to understand its behavior better:

  1. Start with a basic set of environments: use only 10 different environments, do not do data augmentation, train the network and see if it can learn the environments, and slowly increase the diversity, like 100, 1000, and so on. This will give you an idea of the limit of the generalization of this specific setup;
  2. Experiment with the architecture itself: add more layers, increase the size of existing layers, and increase the size of the latent space (maybe the current latent space size is not enough to represent your environment). This will help you understand the specific network architecture limits.

I did these two experiments extensively for a week and could understand the limits (I also fixed some bugs in my code). Then, I decided to move forward with other networks because they were insufficient for my specific application. I will still experiment with more modern point cloud encoders such as transformers and diffusion, in the near future.

I hope this can help you at all! if you have a breakthrough please let me know, I am curious to understand more!