Research Priority Queue

LAION-AI / temporal-embedding-aggregation

Aggregating embeddings over time

31 stars 4 forks source link

Research Priority Queue #40

Open iejMac opened 2 years ago

iejMac commented 2 years ago

[ ] More Benchmarks (https://github.com/LAION-AI/temporal-embedding-aggregation/issues/33)
[x] Test Different CLIP backbones (https://github.com/LAION-AI/temporal-embedding-aggregation/issues/36)
[ ] Hyperparameter sweep for simple aggregator like self attention (https://github.com/LAION-AI/temporal-embedding-aggregation/issues/37)
[ ] Architecture tuning (https://github.com/LAION-AI/temporal-embedding-aggregation/issues/38)
[ ] video-clip guided stable diffusion (https://github.com/LAION-AI/temporal-embedding-aggregation/issues/39)

iejMac commented 2 years ago

Test Different CLIP backbones - H/14 gets much better results and also isn't much slower (due to how slow video decoding is) so we will likely shift to H/14 embeddings (or maybe L/14) while video decoding is still the bottleneck. If we decide to change the architecture of clip-video-encode to alleviate this bottleneck we should revisit this question.