-
Hi, thank you for sharing your work and congratulations on the paper!
I am trying to use COOT to create video descriptions for videos that aren't in ActivityNet. I saw your [comment ](https://githu…
-
thanks advance!
-
The HowTo100M + VidChapters-7M + ViTT model is performing poorly on dense video captioning.
Reproduction:
Run
```
yt-dlp -P $TRANSFORMERS_CACHE -o video.mp4 https://www.youtube.com/watch?v=WJ…
-
In the Line 85 of [SwinBERT](https://github.com/microsoft/SwinBERT/tree/main/prepro)/create_image_frame_tsv.py.
" current_image_path = previous_image_path "
Does it mean when the amount of extra…
-
Hi, thanks for great code and amazing work
I do my best to to make similar performance but I get some trouble.
Could I get some of your advice?
1. Download dataset
Unfortunately, some of datas…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
在total_batch_size相同的情况下,单机(8卡)训练速度和多机(16卡)一样。对于想使用这个仓库scale数据规模成了阻碍
### Reproduction
使用的torchrun调用
脚本为…
-
hi @crodriguezo,
Can you share some details about the training details such as how long it will take, what is hardware/GPU was used?
Currently, on A100-80GB (24 CPUs), the training is too slow …
-
Hi, guys! Thank you for the project a lot. But I have an issue with downloading pretrained models using download_models.sh. I've tied different networks, but it fails all the time. Do you have another…
-
Hi,
Thanks for the nice library. I found DALI while looking for a video loader for action recognition. I found that DALI yet cannot handle various resolution as in the issue #725 which is necessary f…
-
Hi, How do you get the videos for the youcook2 datasets, since they only provide annotations? Would I need to download each video from youtube? Or do you provide embeddings for the videos?
Thanks