-
hi! In caption_model, I don't see how you invoke the code of the scene graph for each video you generated before, but instead use the data in sum data. Where did you invoke the scene graph you generat…
-
-
-
Hello,
I recently read your paper and it states that for "Charades dataset, we configure our network to use T = 64,
T= 128 and α = 1/4."
Could you point me to where in the code the number of frames…
-
This problem is a multi-label classification problem and does require BCEloss. But I don't know why the training convergence is very slow. I use 4*1080Ti at a time, 64 frames input, per 80 videos upd…
-
Hello, I experimented your model by following your instruction of inference and got result metric as below.
![image](https://github.com/user-attachments/assets/43933e5d-cfb0-4aad-a107-0c407d75a30a)…
-
### Question
Dear LLaVA Developer Team,
I must say the LMM is truly brilliant! 😊 I have a question: is LLaVA capable of performing video-QA? In other words, can the model accept a video or a set o…
-
## Tasks
- [x] Video classificaiton
* [x] 2D CNN (RGB, Two-stream CNN)
* [x] 3D CNN (I3D)
* [x] TSN sampling
- [x] Temporal Action Detection
* [x] SSN
* [ ] TAG proposal (v0.2)
*…
-
I downloaded AVA video data based on DATASET. Md, but some videos failed to download for some reason. I used downloadable videos to process the data and test the results. There is an error, great vide…
-
Dear Siyuan,
In regards to the CAD-120 experimental results, previous works do a 4-fold cross-validation where one of the subjects is left out for testing whilst the other three are used for traini…