Dataset Mixing - Githubissues

There has been a lot of research into dataset mixing strategies that go beyond naive mixing (i.e. randomly shuffling all datasets together). I mainly draw from this paper, where they show improved results by incrementally introducing harder datasets to the model. This would be useful for us, since we have simulation ("easy") and real-world ("hard") data. Further in the future, we may even want to create a dataset of exclusively challenging situations, such as water, reflective surfaces, glass windows, etc. It may prove more effective to train on these data after a strong baseline has been created.

I don't know exactly what form this functionality should take - I am currently thinking a DatasetMixer wrapper class that takes any Dataset of ours and controls the order of dataset introduction.

The other aspect to this issue is the actual research question: Does strategic dataset mixing significantly improve real-world performance? The paper listed above only shows improvement on the standard KITTI and NYU datasets, but this doesn't mean it will be helpful for us. It could also be beneficial (and research-worthy) to document the effect of training solely on simulation and finetuning on real-world, if there is any benefit.

appliedinnovation / fast-depth

Dataset Mixing #22