Closed XiaohanLei closed 3 weeks ago
Hi,
Thanks for your interest in our work. It seems like you are unable to fit on the training data.
Is this issue primarily due to insufficient data? I don't think so.
What other potential reasons could be causing the model to fail to converge? Can you share the loss curve? I would start by exploring hyperparameters like learning rate and disable any augmentation and regularization. Also, are the rendered images the same as RVT's virtual images? Note, RVT has 5 virtual images, while RVT-2 has 3.
For such a simple task, approximately how many samples might be needed to see convergence? In our experiments, we found 10 to be enough for generation. A lower number of samples should facilitate convergence. More samples only help in generalization, not train-time convergence.
Are there any suggestions to improve the training process or data collection method? Can you share some examples of collected data, i.e., the point cloud and the ground-truth robot pose?
I discover that it is due to my dataset being too small, which result in the cosine learning rate not rising much before the training complete. In other words, the learning rate is too low. Thank you for your kind response.
Content:
Problem Description
I'm attempting to train a RVT-2 model for a simple task: "lift the block". I've collected 10 demonstration samples in real-world scenarios for training, but the model shows no signs of convergence at all.
Environment
Attempts
So far, I've only tried training with the 10 collected samples.
Questions
Additional Information
the former is the pointcloud, and the latter is the rendered results
Any help or advice would be greatly appreciated!