YuanJianhao508 / RAG-Driver

A Multi-Modal Large Language Model with Retrieval-augmented In-context Learning capacity designed for generalisable and explainable end-to-end driving
Apache License 2.0
75 stars 5 forks source link

Why uses test data to train the retriever. #13

Closed TorresYangX closed 2 months ago

TorresYangX commented 2 months ago

Hello, authors, thanks for your work.

I noticed something unusual. Could you explain why both test data and train data are used simultaneously when training the retriever? Specifically, in https://github.com/YuanJianhao508/RAG-Driver/blob/24a0bd01f56c6fa4f2563a4c94925856e2ba707e/retrieval/train.py#L33

conv includes both train_conv and test_conv. Could you explain the reason behind doing this?

Thanks for your assistance!

YuanJianhao508 commented 2 months ago

Hi @TorresYangX, thanks for raising the question! When training the retriever, we combined both the training and test datasets to train the projector but we do not use the test data for RAG and training the VLM. Thank you very much!