LAION-AI / temporal-embedding-aggregation

Aggregating embeddings over time
30 stars 4 forks source link

Research Priority Queue #40

Open iejMac opened 1 year ago

iejMac commented 1 year ago
iejMac commented 1 year ago

Test Different CLIP backbones - H/14 gets much better results and also isn't much slower (due to how slow video decoding is) so we will likely shift to H/14 embeddings (or maybe L/14) while video decoding is still the bottleneck. If we decide to change the architecture of clip-video-encode to alleviate this bottleneck we should revisit this question.