-
Hi, thanks for your good job.
```
# Latent Fusion
def fusion(self, audio_tokens, visual_tokens):
# shapes
BS = audio_tokens.shape[0]
# concat all the tokens
…
-
**Issue and Steps to Reproduce:** Bhajans should also be included in audio form to improve mental health so that users can listen to it anytime and provide better experience for relaxation.
**S…
-
### Is your feature request related to a problem? Please describe.
Currently we just have a big list of links in the AV and press sections. Looking for a proposal on how to make this more appealing a…
-
link text: [VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs](https://arxiv.org/pdf/2306.02858)
Actual title of the paper: Video-LLaMA An Instruction-tuned Audi…
-
### Describe the Feature/Enhancement
Add support for subtitle files (e.g., VTT, SRT) in the Audiobookshelf app to display word-level or phrase-level highlights during audio playback. This feature w…
-
#### Overview
We propose to implement audio-visual calls and screen sharing within our platform's channels using the WebRTC technology facilitated by the PeerJS client/server framework. This feature w…
-
@ZiqiaoPeng 作者是否可以开源下audio_visual_encoder.pth这个模型的训练代码,这边采用中文训练后嘴唇拟合度不是很高,想用中文数据集重新训练下audio_visual_encoder
-
Are the generated audio, video and audio-visual captions for your TAVGBench unavailable just for now, or will it be proprietary?
Access to this benchmark seems pretty crucial for comparisons with you…
-
When I want to train the epic or perception dataset using Omnivore + AuditorySlowfast features, how should I set parameters such as feat_stride, feat_gap, num_feats, feat_dropout, seq_dropout, apply_f…
-
**Godot version:**
3.3.2 stable
**OS/device including version:**
window10 64bit
**Issue description:**
I was modified the offical project spectrum, add two button to control the mus…