-
Vision-language pre-training has significantly elevated performance across a wide range of image-language applications. Yet, the pre-training process for video-related tasks demands exceptionally larg…
-
## Keyword: super resolution
There is no result
## Keyword: gan
### Towards Discovery and Attribution of Open-world GAN Generated Images
- **Authors:** Sharath Girish, Saksham Suri, Saketh Rambhatla…
-
## Keyword: detection
### Video Anomaly Detection by Estimating Likelihood of Representations
- **Authors:** Yuqi Ouyang, Victor Sanchez
- **Subjects:** Computer Vision and Pattern Recognition (cs.C…
-
### Please describe your project. Start with the need or problem you are trying to solve with this project. Describe why your solution is going to adequately solve this problem.
### Challenge:
…
-
While our [draft charter](https://www.w3.org/2023/03/proposed-webmachinelearning-charter.html) says that the group:
> priority on building blocks required by well-known model architectures such as re…
-
Kohya has added preliminary support for Flux.1 LoRA to his SD3 branch. I have created a `sd3-flux.1` branch and updated to the latest sd-scripts sd3 branch code... No GUI integration yet... I will sta…
-
In the past years more and more applications show up that show media content in 3D space, like 360° videos (stereoscopic or not), VR experiences, etc.. Subtitles (if present) are mostly shown at the b…
-
Using the large-v3 model to transcribe greek audio from a live stream, I am often met with continuous results writing "Υπότιτλοι AUTHORWAVE"
It seems the model is bugged in a way that outputs that …
-
What should be the expected behavior of cues when controls or obscure the cues?
According to the spec, cue rendering should be re-done when the native controls are shown (steps 4 and 5 of [the Proc…
-