Closed fmu2 closed 1 year ago
Hi, thanks for the great work! Which CLIP backbone did you use for video/text feature extraction?
Dear @fmu2 we used CLIP B/32. We plan to release B/16 and L/14 early next year.
Hi, thanks for the great work! Which CLIP backbone did you use for video/text feature extraction?