BrandonHanx / FAME-ViL

[CVPR 2023 (Highlight)] FAME-ViL: Multi-Tasking V+L Model for Heterogeneous Fashion Tasks
Other
51 stars 4 forks source link

About the datasets used for training. #1

Closed houjie8888 closed 1 year ago

houjie8888 commented 1 year ago

Thank you for your excellent work. I noticed that only the test dataset was mentioned in the paper. I would like to know which datasets were used during the training phase.

BrandonHanx commented 1 year ago

Thanks for your interest. Please note we are doing multi-task training in this paper. For each task, we used its separate train/test split.

houjie8888 commented 1 year ago

Thank you for the quick reply! For XMR, SCR, and FIC tasks, is it still necessary to use the 260k training set in FashionGen for training? My confusion is whether FAME-ViL is fine-tuned based on pre-trained CLIP or just uses its parameters as the initial values for FAME-ViL.

houjie8888 commented 1 year ago

Thanks for your interest. Please note we are doing multi-task training in this paper. For each task, we used its separate train/test split.

My question may be a bit strange, but intuitively, is it necessary to have a lot of training data to continue training on pre-trained CLIP? Or does fine-tuning not require a specific size of the training set? I am not very knowledgeable about this aspect.

BrandonHanx commented 1 year ago

The current FAME-ViL is initialized on CLIP and further multi-task trained on several tasks (w/ their separate dataset). When we evaluate FAME-ViL, each task is tested on its own dataset.

Using off-the-shelf CLIP without training can not handle all tasks, because zero-shot CLIP can only do the retrieval things.