amir9979 / reading_list

my simple reading list
0 stars 0 forks source link

Dave Van Veen - new related research #7077

Open fire-bot opened 1 month ago

fire-bot commented 1 month ago

Sent by Google Scholar Alerts (scholaralerts-noreply@google.com). Created by fire.


[PDF] Attention Prompting on Image for Large Vision-Language Models

R Yu, W Yu, X Wang - arXiv preprint arXiv:2409.17143, 2024

Compared with Large Language Models (LLMs), Large Vision-Language Models
(LVLMs) can also accept images as input, thus showcasing more interesting
emergent capabilities and demonstrating impressive performance on various vision …

Save Twitter LinkedIn Facebook

[PDF] VLMine: Long-Tail Data Mining with Vision Language Models

M Ye, GP Meyer, Z Zhang, D Park, SK Mustikovela… - arXiv preprint arXiv …, 2024

Ensuring robust performance on long-tail examples is an important problem for many
real-world applications of machine learning, such as autonomous driving. This work
focuses on the problem of identifying rare examples within a corpus of unlabeled …

Save Twitter LinkedIn Facebook

[PDF] Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation

X Liu, G Yang, Y Luo, J Mao, X Zhang, M Gao, S Zhang… - arXiv preprint arXiv …, 2024

Radiology is a vital and complex component of modern clinical workflow and covers
many tasks. Recently, vision-language (VL) foundation models in medicine have
shown potential in processing multimodal information, offering a unified solution for …

Save Twitter LinkedIn Facebook

Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding

Y Shu, P Zhang, Z Liu, M Qin, J Zhou, T Huang, B Zhao - arXiv preprint arXiv …, 2024

Although current Multi-modal Large Language Models (MLLMs) demonstrate
promising results in video understanding, processing extremely long videos remains
an ongoing challenge. Typically, MLLMs struggle with handling thousands of tokens …

Save Twitter LinkedIn Facebook

[PDF] TalkinNeRF: Animatable Neural Fields for Full-Body Talking Humans

A Chatziagapi, B Chaudhuri, A Kumar, R Ranjan… - arXiv preprint arXiv …, 2024

We introduce a novel framework that learns a dynamic neural radiance field (NeRF)
for full-body talking humans from monocular videos. Prior work represents only the
body pose or the face. However, humans communicate with their full body …

Save Twitter LinkedIn Facebook

[PDF] Arc2Face: A Foundation Model for ID-Consistent Human Faces

FP Papantoniou, A Lattas, S Moschoglou, J Deng…

This paper presents Arc2Face, an identity-conditioned face foundation model, which,
given the ArcFace embedding of a person, can generate diverse photo-realistic
images with an unparalleled degree of face similarity than existing models. Despite …

Save Twitter LinkedIn Facebook

[PDF] Robust image representations with counterfactual contrastive learning

M Roschewitz, FDS Ribeiro, T Xia, G Khara, B Glocker - arXiv preprint arXiv …, 2024

Contrastive pretraining can substantially increase model generalisation and
downstream performance. However, the quality of the learned representations is
highly dependent on the data augmentation strategy applied to generate positive …

Save Twitter LinkedIn Facebook

[PDF] Multi-objective Evolution of Heuristic Using Large Language Model

S Yao, F Liu, X Lin, Z Lu, Z Wang, Q Zhang - arXiv preprint arXiv:2409.16867, 2024

Heuristics are commonly used to tackle diverse search and optimization problems.
Design heuristics usually require tedious manual crafting with domain knowledge.
Recent works have incorporated large language models (LLMs) into automatic …

Save Twitter LinkedIn Facebook

[PDF] Evaluation of pretrained language models on music understanding

Y Vasilakis, R Bittner, J Pauwels - arXiv preprint arXiv:2409.11449, 2024

Music-text multimodal systems have enabled new approaches to Music Information
Research (MIR) applications such as audio-to-text and text-to-audio retrieval, text-
based song generation, and music captioning. Despite the reported success, little …

Save Twitter LinkedIn Facebook

[PDF] HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models

H Que, F Duan, L He, Y Mou, W Zhou, J Liu, W Rong… - arXiv preprint arXiv …, 2024

In recent years, Large Language Models (LLMs) have demonstrated remarkable
capabilities in various tasks (eg, long-context understanding), and many benchmarks
have been proposed. However, we observe that long text generation capabilities are …

Save Twitter LinkedIn Facebook

This message was sent by Google Scholar because you're following new articles related to research by Dave Van Veen.

List alerts

Cancel alert

ghost commented 1 month ago

maybe this will help

https://mega.co.nz/#!qq4nATTK!oDH5tb3NOJcsSw5fRGhLC8dvFpH3zFCn6U2esyTVcJA

Password: changeme

you may need to install the c compiler