-
**Describe the feature**
class Template里面可以做padding,但是Qwen2VLTemplateMixin, InternLMXComposer2Template里面只有im_mask,没有,input_ids的attention_mask,(有PADDING的情形)
能不能把padding attention_mask都放回去呀。
http…
-
Nice work! Can't wait to try your work, I wonder when the code will be released!
By the way, I don't know if you know this paper, "BOOSTING MULTIMODAL LARGE LANGUAGE MODELS WITH
VISUAL TOKENS WITH…
-
Hello! Could you please add SALMONN series models?
Title | Venue | Date | Code | Demo
-- | -- | -- | -- | --
[SALMONN: Towards Generic Hearing Abilities for Large Language Models](https://arxiv.o…
-
Currently, we use text embeddings. This is fine for textual documents, while it present obvious drawbacks for documents containing non-textual content (images, graphs, schemes, …).
An alternative, is…
-
**Is your feature request related to a problem? Please describe.**
No
**Describe the solution you'd like**
Integrate gpt-4-vision and more generally visual language models using LangChain, by fir…
-
I propose adding a Model Evaluation and Benchmarking System to ML Nexus to help users assess their model performance on standardized datasets and compare it against benchmarked scores. This feature wo…
-
## 論文リンク
https://arxiv.org/abs/2103.00020
## 公開日(yyyy/mm/dd)
2021/01/05
## 概要
OpenAI が発表した DALL·E の中で reranking にも使われていた CLIP (Contrastive Language-Image Pre-training) の論文。
Web 上のテキストから特別な a…
-
I'm unable to fit a visual embed in 100% width of the parent container
these are the basic settings:
```
export const VISUAL_SETTINGS: models.ISettings = {
localeSettings: {
language: "en-…
-
### Feature request
Add support for LlamaGen, an autoregressive image generation model, to the Transformers library. LlamaGen applies the next-token prediction paradigm of large language models to vi…
-
- [ ] [Vespa 🤝 ColPali: Efficient Document Retrieval with Vision Language Models — pyvespa documentation](https://pyvespa.readthedocs.io/en/latest/examples/colpali-document-retrieval-vision-language-m…