-
Thanks for the great work. I have read the code for JHMDB and have some questions:
(1) The performance of mAP@0.5 is just 0.72, much lower than the 82.3 that is reported.
(2) I also notice that the …
-
For [the version](https://github.com/amazon-science/tubelet-transformer/commit/f610c97251e5539256095508570563ca2dc8c7a1) I am using,
AVA2.1 inference needs several modifications:
1. https://git…
-
Hi,
First, thanks for your work and for providing the implementation.
Following the steps you provided, I downloaded the pretrained |CSN-152 Kinetics-400+IG65M from this link you provided: [Tube…
-
@CarlosGomes98
**Describe the issue**
I'm interested in fine-tuning a ViT model with the patch-embedding size set to something greater than 1 for the temporal dimension. To do this, I at least have…
-
![头图2 ](https://user-images.githubusercontent.com/53006892/109469329-daddde80-7aa8-11eb-86e1-0102bc216efb.jpg)
在自动驾驶领域,基于激光雷达 (LiDAR) 的3D物体检测和运动行为预测是一种普遍的方案。目前绝大部分关于激光雷达的物体检测算法都是基于单帧的。激光雷达的多帧时序数据,提供了…
-
微博内容精选
-
hi,我这边在基于您的权重和代码,想实现一个关于zeroshot文字和视频相似度的测定demo,在读取权重的过程中,log会报如下的日志,从我目前的测试样本和结果来看,结果并不理想,所以想和你确认一下,我看权重读取的时候会报丢失一些key的log,这个是正常的么?我这边没有公开的数据集,是拿自己的样本测试的
```python
2023-09-07T19:37:18 | __main__: c…
-
I noticed that for anet you use `scale_factor = 4` to account for the ViT backbone downsampling, but use `scale_factor = 1` for thumos although it uses the same backbone. Can you please explain the lo…
-
### System Info
I'm not sure if I've missed something in the code, but I can't seem to find where the CLS tokens are added? I have input data of shape (64,45,2,32,32) with tubelet size = 5, patch_s…
-
### System Info
- `transformers` version: 4.35.0
- Platform: Linux-6.5.6-76060506-generic-x86_64-with-glibc2.35
- Python version: 3.10.12
- Huggingface_hub version: 0.16.4
- Safetensors version…