-
The title of the paper https://arxiv.org/pdf/2410.15608 is
> Moonshine: Speech Recognition for Live Transcription and Voice Commands
However, the model is a non-streaming model, could you describe…
-
**To address REQ2 of #318 , we are after an extension of the WebVTT file format.**
The principle idea is that we map the TextTrack API calls from #319 to how we would archive them in a WebVTT file to…
-
No module 'xformers'. Proceeding without it.
ControlLDM: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z…
-
Hi,
Thanks for the nice library. I found DALI while looking for a video loader for action recognition. I found that DALI yet cannot handle various resolution as in the issue #725 which is necessary f…
-
### Motivation.
Currently models like `llava-hf/llava-next-video*` recognize image and video inputs with different tokens, and do different computations. Therefore vLLM should provide new APIs and …
-
-
### StoryDALL-E: Adapting Pretrained Text-to-Image Transformers for Story Continuation
**Maharana et al., ECCV 2022**
> Recent advances in text-to-image synthesis have led to large pretrained tran…
-
[Xbe.txt](https://github.com/Cxbx-Reloaded/game-compatibility/files/1258030/Xbe.txt)
[CxbxDebug.txt](https://github.com/Cxbx-Reloaded/game-compatibility/files/1255977/CxbxDebug.txt)
[KrnlDebug.txt…
-
This picks up something I already noted about two years ago, but could maybe be discussed in the context of WCAG 2.2/silver ... https://lists.w3.org/Archives/Public/w3c-wai-gl/2017JulSep/0052.html
…
-
Hi
I discovered your work on VideoCaption and neuraltalk2 while working on a documentary about respublica Tuva which is a small country, near to Mongolia, federated by Russia. The movie itself is abo…
oxmah updated
7 years ago