Closed houjie8888 closed 1 year ago
Thanks for your interest. Please note we are doing multi-task training in this paper. For each task, we used its separate train/test split.
Thank you for the quick reply! For XMR, SCR, and FIC tasks, is it still necessary to use the 260k training set in FashionGen for training? My confusion is whether FAME-ViL is fine-tuned based on pre-trained CLIP or just uses its parameters as the initial values for FAME-ViL.
Thanks for your interest. Please note we are doing multi-task training in this paper. For each task, we used its separate train/test split.
My question may be a bit strange, but intuitively, is it necessary to have a lot of training data to continue training on pre-trained CLIP? Or does fine-tuning not require a specific size of the training set? I am not very knowledgeable about this aspect.
The current FAME-ViL is initialized on CLIP and further multi-task trained on several tasks (w/ their separate dataset). When we evaluate FAME-ViL, each task is tested on its own dataset.
Using off-the-shelf CLIP without training can not handle all tasks, because zero-shot CLIP can only do the retrieval things.
Thank you for your excellent work. I noticed that only the test dataset was mentioned in the paper. I would like to know which datasets were used during the training phase.