Could you provide more guidance on how to perform inference (e.g. like model card in huggingface to guide follower the model‘s input format) ?
Additionally, I am concerned about the quality of the open-source code for this work. As far as I understand, this is one of the early works introducing Vision-Language Large Models to embodied tasks and, being a spotlight work at NeurIPS, I hope your team could further open up the code and weights to enable the community to conduct more thorough evaluations.
Hi, @YaoMarkMu I think this is a fantastic piece of work. My question is, when I attempted to use the provided weights https://drive.google.com/file/d/1sBTy8oXeweJg3STbhzBR_5pLcVs1F20q/view?usp=sharing for a simple inference in
demo/inference
anddemo/test
by replace themodel_path
:I encountered an error stating like:
Could you provide more guidance on how to perform inference (e.g. like model card in huggingface to guide follower the model‘s input format) ? Additionally, I am concerned about the quality of the open-source code for this work. As far as I understand, this is one of the early works introducing Vision-Language Large Models to embodied tasks and, being a spotlight work at NeurIPS, I hope your team could further open up the code and weights to enable the community to conduct more thorough evaluations.