-
## Inspiration
So there is a gradio space [https://huggingface.co/spaces/hf-audio/whisper-large-v3](url) that uses whisper, from the hugging face api :
```python
import spaces
import torch
…
-
We combine Grounding DINO, Grounding DINO 1.5 and SAM 2 for tracking any object in the input video and we've open-sourced our code here: [Grounded SAM 2](https://github.com/IDEA-Research/Grounded-SAM-…
-
I wanted to test RAM, Kingston DDR4 2*8GB on a Clevo NV41MZ Laptop,
Processors: 8 × 11th Gen Intel® Core™ i7-1165G7 @ 2.80GHz
```
lspci
00:00.0 Host bridge: Intel Corporation 11th Gen Core Pro…
-
I'm apperciate that you use sequence cpu offload and vae tiling to readuce VRAM usage on the CogVideoX nodes. Can these methods be used in this program to readce VRAM usage? I tried to input a video w…
-
**What API design would you like to have changed or added to the library? Why?**
With the increasing resolution of image or video generation, we need to introduce more noise at smaller T, such as S…
-
Have anyone ever ran into the issue where after finetuning the output doesn't know when to end, only ends until max new token is reached? Does it has to do with the tokenizer is not adding an eos toke…
-
Thank you for the latest version of the code release. When I actually trained and used different sampling strategies, I found that the effect of pyramid sampling is not as good as full_sequence_sampli…
-
Hello,
Imagen-Video states that they use model distillation to iteratively train student diffusion models that require half the sampling steps of their teacher diffusion model. This seems to be an …
-
How to get all masks directly?
-
Hello, each video corresponds to two XML files, how to use XML?
In addition, video sampling frequency and finger pulse sampling frequency are inconsistent, how to synchronize?