R Yu, W Yu, X Wang - arXiv preprint arXiv:2409.17143, 2024
Compared with Large Language Models (LLMs), Large Vision-Language Models
(LVLMs) can also accept images as input, thus showcasing more interesting
emergent capabilities and demonstrating impressive performance on various vision …
M Ye, GP Meyer, Z Zhang, D Park, SK Mustikovela… - arXiv preprint arXiv …, 2024
Ensuring robust performance on long-tail examples is an important problem for many
real-world applications of machine learning, such as autonomous driving. This work
focuses on the problem of identifying rare examples within a corpus of unlabeled …
X Liu, G Yang, Y Luo, J Mao, X Zhang, M Gao, S Zhang… - arXiv preprint arXiv …, 2024
Radiology is a vital and complex component of modern clinical workflow and covers
many tasks. Recently, vision-language (VL) foundation models in medicine have
shown potential in processing multimodal information, offering a unified solution for …
Y Shu, P Zhang, Z Liu, M Qin, J Zhou, T Huang, B Zhao - arXiv preprint arXiv …, 2024
Although current Multi-modal Large Language Models (MLLMs) demonstrate
promising results in video understanding, processing extremely long videos remains
an ongoing challenge. Typically, MLLMs struggle with handling thousands of tokens …
A Chatziagapi, B Chaudhuri, A Kumar, R Ranjan… - arXiv preprint arXiv …, 2024
We introduce a novel framework that learns a dynamic neural radiance field (NeRF)
for full-body talking humans from monocular videos. Prior work represents only the
body pose or the face. However, humans communicate with their full body …
This paper presents Arc2Face, an identity-conditioned face foundation model, which,
given the ArcFace embedding of a person, can generate diverse photo-realistic
images with an unparalleled degree of face similarity than existing models. Despite …
M Roschewitz, FDS Ribeiro, T Xia, G Khara, B Glocker - arXiv preprint arXiv …, 2024
Contrastive pretraining can substantially increase model generalisation and
downstream performance. However, the quality of the learned representations is
highly dependent on the data augmentation strategy applied to generate positive …
S Yao, F Liu, X Lin, Z Lu, Z Wang, Q Zhang - arXiv preprint arXiv:2409.16867, 2024
Heuristics are commonly used to tackle diverse search and optimization problems.
Design heuristics usually require tedious manual crafting with domain knowledge.
Recent works have incorporated large language models (LLMs) into automatic …
Y Vasilakis, R Bittner, J Pauwels - arXiv preprint arXiv:2409.11449, 2024
Music-text multimodal systems have enabled new approaches to Music Information
Research (MIR) applications such as audio-to-text and text-to-audio retrieval, text-
based song generation, and music captioning. Despite the reported success, little …
H Que, F Duan, L He, Y Mou, W Zhou, J Liu, W Rong… - arXiv preprint arXiv …, 2024
In recent years, Large Language Models (LLMs) have demonstrated remarkable
capabilities in various tasks (eg, long-context understanding), and many benchmarks
have been proposed. However, we observe that long text generation capabilities are …
This message was sent by Google Scholar because you're following new articles related to research by Dave Van Veen.
Sent by Google Scholar Alerts (scholaralerts-noreply@google.com). Created by fire.
[PDF] Attention Prompting on Image for Large Vision-Language Models
R Yu, W Yu, X Wang - arXiv preprint arXiv:2409.17143, 2024
Compared with Large Language Models (LLMs), Large Vision-Language Models
(LVLMs) can also accept images as input, thus showcasing more interesting
emergent capabilities and demonstrating impressive performance on various vision …
[PDF] VLMine: Long-Tail Data Mining with Vision Language Models
M Ye, GP Meyer, Z Zhang, D Park, SK Mustikovela… - arXiv preprint arXiv …, 2024
Ensuring robust performance on long-tail examples is an important problem for many
real-world applications of machine learning, such as autonomous driving. This work
focuses on the problem of identifying rare examples within a corpus of unlabeled …
[PDF] Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation
X Liu, G Yang, Y Luo, J Mao, X Zhang, M Gao, S Zhang… - arXiv preprint arXiv …, 2024
Radiology is a vital and complex component of modern clinical workflow and covers
many tasks. Recently, vision-language (VL) foundation models in medicine have
shown potential in processing multimodal information, offering a unified solution for …
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Y Shu, P Zhang, Z Liu, M Qin, J Zhou, T Huang, B Zhao - arXiv preprint arXiv …, 2024
Although current Multi-modal Large Language Models (MLLMs) demonstrate
promising results in video understanding, processing extremely long videos remains
an ongoing challenge. Typically, MLLMs struggle with handling thousands of tokens …
[PDF] TalkinNeRF: Animatable Neural Fields for Full-Body Talking Humans
A Chatziagapi, B Chaudhuri, A Kumar, R Ranjan… - arXiv preprint arXiv …, 2024
We introduce a novel framework that learns a dynamic neural radiance field (NeRF)
for full-body talking humans from monocular videos. Prior work represents only the
body pose or the face. However, humans communicate with their full body …
[PDF] Arc2Face: A Foundation Model for ID-Consistent Human Faces
FP Papantoniou, A Lattas, S Moschoglou, J Deng…
This paper presents Arc2Face, an identity-conditioned face foundation model, which,
given the ArcFace embedding of a person, can generate diverse photo-realistic
images with an unparalleled degree of face similarity than existing models. Despite …
[PDF] Robust image representations with counterfactual contrastive learning
M Roschewitz, FDS Ribeiro, T Xia, G Khara, B Glocker - arXiv preprint arXiv …, 2024
Contrastive pretraining can substantially increase model generalisation and
downstream performance. However, the quality of the learned representations is
highly dependent on the data augmentation strategy applied to generate positive …
[PDF] Multi-objective Evolution of Heuristic Using Large Language Model
S Yao, F Liu, X Lin, Z Lu, Z Wang, Q Zhang - arXiv preprint arXiv:2409.16867, 2024
Heuristics are commonly used to tackle diverse search and optimization problems.
Design heuristics usually require tedious manual crafting with domain knowledge.
Recent works have incorporated large language models (LLMs) into automatic …
[PDF] Evaluation of pretrained language models on music understanding
Y Vasilakis, R Bittner, J Pauwels - arXiv preprint arXiv:2409.11449, 2024
Music-text multimodal systems have enabled new approaches to Music Information
Research (MIR) applications such as audio-to-text and text-to-audio retrieval, text-
based song generation, and music captioning. Despite the reported success, little …
[PDF] HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
H Que, F Duan, L He, Y Mou, W Zhou, J Liu, W Rong… - arXiv preprint arXiv …, 2024
In recent years, Large Language Models (LLMs) have demonstrated remarkable
capabilities in various tasks (eg, long-context understanding), and many benchmarks
have been proposed. However, we observe that long text generation capabilities are …
This message was sent by Google Scholar because you're following new articles related to research by Dave Van Veen.
List alerts
Cancel alert