amusi / CVPR2024-Papers-with-Code

CVPR 2024 论文和开源项目合集
18.39k stars 2.6k forks source link

欢迎分享CVPR 2024 论文和代码 / Welcome to share the paper and code of CVPR 2024 #210

Open amusi opened 9 months ago

amusi commented 9 months ago

[The format of the issue] Paper name/title: Paper link: Code link:

iamhankai commented 9 months ago

Paper name/title: ParameterNet: Parameters Are All You Need for Large-scale Visual Pretraining of Mobile Networks Paper link: https://arxiv.org/abs/2306.14525 Code link: https://parameternet.github.io/

iamhankai commented 9 months ago

Paper name/title: An Empirical Study of Scaling Law for OCR Paper link: https://arxiv.org/abs/2401.00028 Code link: https://github.com/large-ocr-model/large-ocr-model.github.io

KuanchihHuang commented 9 months ago

Paper name/title: PTT: Point-Trajectory Transformer for Efficient Temporal 3D Object Detection Paper link: https://arxiv.org/abs/2312.08371 Code link: https://github.com/kuanchihhuang/PTT

ShunyuanZheng commented 9 months ago

Paper name/title: GPS-Gaussian: Generalizable Pixel-wise 3D Gaussian Splatting for Real-time Human Novel View Synthesis Paper link: https://arxiv.org/abs/2312.02155 Code link: https://github.com/ShunyuanZheng/GPS-Gaussian Project link: https://shunyuanzheng.github.io/GPS-Gaussian

huliangxiao commented 9 months ago

Paper name/title: GaussianAvatar: Towards Realistic Human Avatar Modeling from a Single Video via Animatable 3D Gaussians Paper link: https://arxiv.org/abs/2312.02134 Code link: https://github.com/huliangxiao/GaussianAvatar

TIANLE233 commented 9 months ago

Paper name/title: Stronger, Fewer, & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation Paper link: https://arxiv.org/abs/2312.04265 Code link: https://github.com/w1oves/Rein

zhuangshaobin commented 9 months ago

Paper name/title: Vlogger: Make Your Dream A Vlog Paper link: https://arxiv.org/abs/2401.09414 Code link: https://github.com/Vchitect/Vlogger

BarqueroGerman commented 9 months ago

Paper name/title: Seamless Human Motion Composition with Blended Positional Encodings Paper link: https://arxiv.org/abs/2402.15509 Code link: https://github.com/BarqueroGerman/FlowMDM

buaacyw commented 9 months ago

Paper name/title: GaussianEditor: Swift and Controllable 3D Editing with Gaussian Splatting Paper link: https://arxiv.org/abs/2311.14521 Code link: https://github.com/buaacyw/GaussianEditor

Hansxsourse commented 9 months ago

Paper name/title: UniGS: Unified Representation for Image Generation and Segmentation Paper link: https://arxiv.org/abs/2312.01985

classification could be: Diffusion / Image Generation / Segmentation

ch3cook-fdu commented 9 months ago

Paper name/title: LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning Paper link: https://arxiv.org/abs/2311.18651 Code link: https://github.com/Open3DA/LL3DA Project link: https://ll3da.github.io/

geometry-adaptation commented 9 months ago

Paper name/title: CLOVA: A Closed-LOop Visual Assistant with Tool Usage and Update Paper link: https://arxiv.org/pdf/2312.10908.pdf Project link: https://clova-tool.github.io/

thaoshibe commented 9 months ago

Paper name/title: Edit One for All: Interactive Batch Image Editing Paper link: https://arxiv.org/abs/2401.10219 Code link: https://github.com/thaoshibe/edit-one-for-all Project page: https://thaoshibe.github.io/edit-one-for-all

Nightmare-n commented 9 months ago

Paper name/title: UniPAD: A Universal Pre-training Paradigm for Autonomous Driving Paper link: https://arxiv.org/abs/2310.08370 Code link: https://github.com/Nightmare-n/UniPAD

DearCaat commented 9 months ago

Paper name/title: Feature Re-Embedding: Towards Foundation Model-Level Performance in Computational Pathology Paper link: https://arxiv.org/abs/2402.17228 Code link: https://github.com/DearCaat/RRT-MIL

Luffy03 commented 9 months ago

Paper name/title: VoCo: A Simple-yet-Effective Volume Contrastive Learning Framework for 3D Medical Image Analysis Paper link: https://arxiv.org/abs/2402.17300 Code link: https://github.com/Luffy03/VoCo

xb534 commented 9 months ago

Paper name/title: SED: A Simple Encoder-Decoder for Open-Vocabulary Semantic Segmentation Paper link: https://arxiv.org/abs/2311.15537 Code link: https://github.com/xb534/SED

WeichenFan commented 9 months ago

Paper name/title: Link-Context Learning for Multimodal LLMs Paper link: https://arxiv.org/pdf/2308.07891.pdf Code link: https://github.com/isekai-portal/Link-Context-Learning/tree/main

Murrol commented 9 months ago

Paper name/title: MoMask: Generative Masked Modeling of 3D Human Motions Paper link: https://arxiv.org/abs/2312.00063 Code link: https://github.com/EricGuo5513/momask-codes

Andy1621 commented 9 months ago

Paper name/title: MVBench: A Comprehensive Multi-modal Video Understanding Benchmark Paper link: https://arxiv.org/abs/2311.17005 Code link: https://github.com/OpenGVLab/Ask-Anything/tree/main/video_chat2

ethancohen123 commented 9 months ago

Paper name/title: ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images Paper link: https://arxiv.org/abs/2311.15264 Code link: https://github.com/nicoboou/chada_vit

ingra14m commented 9 months ago

Paper name/title: Deformable 3D Gaussians for High-Fidelity Monocular Dynamic Scene Reconstruction Paper link: https://arxiv.org/abs/2309.13101 Code link: https://github.com/ingra14m/Deformable-3D-Gaussians Project page: https://ingra14m.github.io/Deformable-Gaussians/

ingra14m commented 9 months ago

Paper name/title: SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes Paper link: https://arxiv.org/abs/2312.14937 Code link: https://github.com/yihua7/SC-GS Project page: https://yihua7.github.io/SC-GS-web/

yyvhang commented 9 months ago

Paper name/title: LEMON: Learning 3D Human-Object Interaction Relation from 2D Images (Embodied AI) Paper link: https://arxiv.org/abs/2312.08963 Code link: https://github.com/yyvhang/lemon_3d

horseee commented 9 months ago

Paper name/title: DeepCache: Accelerating Diffusion Models for Free Paper link: https://arxiv.org/abs/2312.00858 Code link: https://github.com/horseee/DeepCache

SunzeY commented 9 months ago

Paper name/title: Alpha-CLIP: A CLIP Model Focusing on Wherever You Want Paper link: https://arxiv.org/abs/2312.03818 Code link: https://github.com/SunzeY/AlphaCLIP

yinanhe commented 9 months ago

Paper name/title: VBench: Comprehensive Benchmark Suite for Video Generative Models Paper link: https://arxiv.org/abs/2311.17982 Code link: https://github.com/Vchitect/VBench Project Page: https://vchitect.github.io/VBench-project/

shikiw commented 9 months ago

Paper name/title: OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allocation Paper link: https://arxiv.org/abs/2311.17911 Code link: https://github.com/shikiw/OPERA

jameslahm commented 9 months ago

Paper name/title: RepViT: Revisiting Mobile CNN From ViT Perspective Paper link: https://arxiv.org/abs/2307.09283 Code link: https://github.com/THU-MIG/RepViT

lixinustc commented 9 months ago

Paper name/title: SeD: Semantic-Aware Discriminator for Image Super-Resolution Paper link: https://arxiv.org/abs/2402.19387 Code link: https://github.com/lbc12345/SeD

vimar-gu commented 9 months ago

Paper name/title: Efficient Dataset Distillation via Minimax Diffusion Paper link: https://arxiv.org/abs/2311.15529 Code link: https://github.com/vimar-gu/MinimaxDiffusion

Catherine-R-He commented 8 months ago

Paper name/title: Improved Visual Grounding through Self-Consistent Explanations Paper link: https://arxiv.org/abs/2312.04554 Code link: https://github.com/uvavision/SelfEQ

lizhan17 commented 8 months ago

Paper name/title: Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis Paper link: https://arxiv.org/abs/2312.16812 Code link: https://github.com/oppo-us-research/SpacetimeGaussians Project Page: https://oppo-us-research.github.io/SpacetimeGaussians-website/

HowieMa commented 8 months ago

Paper name/title: MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers Paper link: https://arxiv.org/abs/2312.12468 Project Page: https://maskint.github.io

Gary-code commented 8 months ago

Paper name/title: Making Large Multimodal Models Understand Arbitrary Visual Prompts Paper link: https://arxiv.org/abs/2312.00784 Project Page: https://vip-llava.github.io/

lixinustc commented 8 months ago

Paper name/title: KVQ: Kaleidoscope Video Quality Assessment for Short-form Videos Paper link: https://arxiv.org/abs/2402.07220 Code link: https://github.com/lixinustc/KVQ-Challenge-CVPR-NTIRE2024 Project Page: https://lixinustc.github.io/projects/KVQ/

ChenD-VL commented 8 months ago

Paper name/title:ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting Paper link:https://arxiv.org/abs/2403.00303 Code link:https://github.com/PriNing/ODM

SY-Xuan commented 8 months ago

Paper name/title: Pink: Unveiling the power of referential comprehension for multi-modal llms Paper link: https://arxiv.org/abs/2310.00582 Code link: https://github.com/SY-Xuan/Pink

limuloo commented 8 months ago

Paper name/title: MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis Paper link: https://arxiv.org/pdf/2402.05408.pdf Code link: https://github.com/limuloo/migc

xuxw98 commented 8 months ago

Paper name/title: Memory-based Adapters for Online 3D Scene Perception Paper link: https://arxiv.org/abs/2403.06974 Code link: https://github.com/xuxw98/Online3D

jpthu17 commented 8 months ago

Paper name/title: Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding Paper link: https://arxiv.org/abs/2311.08046 Code link: https://github.com/PKU-YuanGroup/Chat-UniVi

hzxie commented 8 months ago

Paper name/title: CityDreamer: Compositional Generative Model of Unbounded 3D Cities Paper link: https://arxiv.org/abs/2309.00610 Code link: https://github.com/hzxie/city-dreamer Project Page: https://haozhexie.com/project/city-dreamer/

csuhan commented 8 months ago

Paper name/title: OneLLM: One Framework to Align All Modalities with Language Paper link: https://arxiv.org/abs/2312.03700 Code link: https://github.com/csuhan/OneLLM

Tianhao-Qi commented 8 months ago

Paper name/title: DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations Paper link: https://arxiv.org/abs/2403.06951 Code link: https://github.com/Tianhao-Qi/DEADiff_code Project page: https://tianhao-qi.github.io/DEADiff/

ximinng commented 8 months ago

Paper name/title: SVGDreamer: Text Guided SVG Generation with Diffusion Model Paper link: https://arxiv.org/abs/2312.16476 Code link: https://ximinng.github.io/SVGDreamer-project/

zhengli97 commented 8 months ago

Paper name/title: PromptKD: Unsupervised Prompt Distillation for Vision-Language Models. Paper link: https://arxiv.org/abs/2403.02781 Code link: https://github.com/zhengli97/PromptKD

FYTalon commented 8 months ago

Paper name/title: PIE-NeRF🍕: Physics-based Interactive Elastodynamics with NeRF Paper link: https://arxiv.org/abs/2311.13099 Code link: https://github.com/FYTalon/pienerf/

jiuntian commented 8 months ago

Paper name/title: InteractDiffusion: Interaction-Control for Text-to-Image Diffusion Model Paper link: https://arxiv.org/abs/2312.05849 Code link: https://github.com/jiuntian/interactdiffusion

924973292 commented 8 months ago

Paper name/title: Magic Tokens: Select Diverse Tokens for Multi-modal Object Re-Identification Paper link: https://arxiv.org/abs/2403.10254 Code link: https://github.com/924973292/EDITOR

YixunLiang commented 8 months ago

Paper name/title: LucidDreamer: Towards High-Fidelity Text-to-3D Generation via Interval Score Matching Paper link: https://arxiv.org/abs/2311.11284 Code link: https://github.com/EnVision-Research/LucidDreamer