-
Excellent work!
Amazing LipVoicer!
I have a small question about the evaluation metric of sync: LSE-C and LSE-D.
In [LIPVOICER: GENERATING SPEECH FROM SILENT VIDEOS GUIDED BY LIP READING](https…
-
Hi, I see that the total score of cogvideo2B on Leaderboard is 80.94%, but after I use all_dimension_long. txt to inference, the total score measured is only 78.68%.
The video I produced with cogvide…
-
### Action
Update publication
### Title
Evaluating In-Context Learning of Libraries for Code Generation
### Shorthand
icl-libraries
### Author
Arkil Patel
### Names
Arkil Patel, Siva Reddy, D…
-
Hi Authors,
Thanks for your great work first! It's an amazing contribution to the video understanding task!
However, when I try to reproduce the results reported in the paper, I get several trou…
-
Tasks that have been identified and scheduled:
+ Fine-tuning support for Diffusers version models
+ Adaptation for CPU / NPU inference frameworks (e.g., Huawei, Intel devices)
+ ComfyUI adaptat…
-
Hi, nice work!
Do you have a plan to release the evaluation code of SHOW-1 in UCF-101 and MSRVTT? If you can open source the evaluation code, I believe that future work can be fairly compared to sh…
-
I can't find the function or any other files to use Flow-Square-Mean ?
-
Expanding the provided code to fully recreate the VASA-1 system as described in the research paper would require a significant amount of additional code and architectural changes. Here's a high-level …
-
Currently prompt2model is limited to text input text output tasks. The underlying framework can certainly handle different modalities, and it would be great to see prompt2model be able to handle diffe…
-
# Overview
TBD
# Progress
- [X] Establish repo with some likely useful [NELLIE](https://github.com/nweir127/guided_inference) code
- [x] Implement TVQA HF dataset on BRTX
- [x] Set up TVQA even…