Hello:
Outstanding work!
On the data set DROID and the six downstream task data, have you tried to merge these data together and only train an end-to-end visual language diffusion policy? Then examine the generalization ability of this VLA model?
Thank you so much!
Thanks for your question! We have so far only trained single-task policies with the 6 fine-tuning datasets. We are working on training multi-task & multi-scene policies -- stay tuned :)
Hello: Outstanding work! On the data set DROID and the six downstream task data, have you tried to merge these data together and only train an end-to-end visual language diffusion policy? Then examine the generalization ability of this VLA model? Thank you so much!