-
**Description**
Ragchat currently cant process info from pictures.
**Workflow**
1-Implement functionality to process information by image to text.
2-Handle the picture sentences.
**Acceptance…
-
I see text to image as a supported feature. How about image to text. There are quite a few capable multimodal self-host models these days such as moondream2 and minicpm2.6 that are supported in ollama…
-
-
In Comfyui, the image to video, control, and video to video nodes all seem to have a progress bar.
The text to image node though is missing any kind of progress report.
-
### Feature request
Implement the new feature to support a pipeline that can take both an image and text as inputs, and produce a text output. This would be particularly useful for multi-modal tasks …
-
### Feature request
This is a tracker issue for work on _interleaved_ in-and-out image-text generation.
There are now >= 5 open-source models that can do _interleaved_ image-text generation--and…
-
The current captcha implementation uses plain text, which can be easily inspected and extracted from the page source (e.g., using developer tools).
![brave_bNkQVnscd4](https://github.com/user-attach…
-
### Describe the bug
[/usr/local/lib/python3.10/dist-packages/gradio/external.py](https://localhost:8080/#) in from_model(model_name, hf_token, alias, **kwargs)
368 fn = client.image_to_…
-
### Describe your issue. If applicable, add screenshots to help explain your problem.
_controller = CameraController(
camera,
// Set to ResolutionPreset.high. Do NOT set it to Resolut…
-
**When I run these two, I get this error: RuntimeError: The shape of the 2D attn_mask is torch.Size([77, 77]), but should be (1, 1). Specific errors are as follows:**
F:\Miniconda\envs\dream\lib\site…