:star_and_crescent:添加微信: nvshenj125, 备注方向,进交流学习群
欢迎关注公众号:AI算法与图像处理
:star2: CVPR 2024 持续更新最新论文/paper和相应的开源代码/code!
B站demo:https://space.bilibili.com/288489574
:hand: 注:欢迎各位大佬提交issue,分享CVPR 2022论文/paper和开源项目!共同完善这个项目
往年顶会论文汇总:
CVPR 2024 论文/paper交流群已成立!已经收录的同学,可以添加微信:nvshenj125,请备注:CVPR+姓名+学校/公司名称!一定要根据格式申请,可以拉你进群。
HoloVIC: Large-scale Dataset and Benchmark for Multi-Sensor Holographic Intersection and Vehicle-Infrastructure Cooperative
Traffic Scene Parsing through the TSP6K Dataset
Balancing Act: Distribution-Guided Debiasing in Diffusion Models
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
DiffAssemble: A Unified Graph-Diffusion Model for 2D and 3D Reassembly
Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks
Few-shot Learner Parameterization by Diffusion Time-steps
MedM2G: Unifying Medical Multi-Modal Generation via Cross-Guided Diffusion with Visual Invariant
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
Face2Diffusion for Fast and Editable Face Personalization
DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations
MACE: Mass Concept Erasure in Diffusion Models
It's All About Your Sketch: Democratising Sketch Control in Diffusion Models
SemCity: Semantic Scene Generation with Triplane Diffusion
RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization
NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging
Discriminative Probing and Tuning for Text-to-Image Generation
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
Text2QR: Harmonizing Aesthetic Customization and Scanning Robustness for Text-Guided QR Code Generation
Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers
GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding
DNGaussian: Optimizing Sparse-View 3D Gaussian Radiance Fields with Global-Local Depth Normalization
S-DyRF: Reference-Based Stylized Radiance Fields for Dynamic Scenes
PromptKD: Unsupervised Prompt Distillation for Vision-Language Models
Logit Standardization in Knowledge Distillation
RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features
$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections
MP5: A Multi-modal Open-ended Embodied System in Minecraft via Active Perception
Polos: Multimodal Metric Learning from Human Feedback for Image Captioning
MADTP: Multimodal Alignment-Guided Dynamic Token Pruning for Accelerating Vision-Language Transformer
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Matching Framework
Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations
Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning
UniMODE: Unified Monocular 3D Object Detection
CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images
Memory-based Adapters for Online 3D Scene Perception
论文/Paper: https://arxiv.org/abs/2403.06974
代码/Code:https://github.com/xuxw98/Online3D
Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement
论文/Paper: https://arxiv.org/abs/2403.16131
Enhancing 3D Object Detection with 2D Detection-Guided Query Anchors
SAFDNet: A Simple and Effective Network for Fully Sparse 3D Object Detection
DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking
Delving into the Trajectory Long-tail Distribution for Muti-object Tracking
PEM: Prototype-based Efficient MaskFormer for Image Segmentation
Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation
Text-Guided Variational Image Generation for Industrial Anomaly Detection and Segmentation
Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration
Depth-aware Test-Time Training for Zero-shot Video Object Segmentation
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
Representations for Recognition and Retrieval
Dual Pose-invariant Embeddings: Learning Category and Object-specific Discriminative Representations for Recognition and Retrieval
Learning to Rematch Mismatched Pairs for Robust Cross-Modal Retrieval
How to Handle Sketch-Abstraction in Sketch-Based Image Retrieval?
SeD: Semantic-Aware Discriminator for Image Super-Resolution
Training Generative Image Super-Resolution Models by Wavelet-Domain Losses Enables Better Control of Artifacts
CAMixerSR: Only Details Need More "Attention"
Low-Res Leads the Way: Improving Generalization for Super-Resolution by Self-Supervised Learning
Boosting Image Restoration via Priors from Pre-trained Models
Doubly Abductive Counterfactual Inference for Text-based Image Editing
A Unified Framework for Microscopy Defocus Deblur with Multi-Pyramid Transformer and Contrastive Learning
Abductive Ego-View Accident Video Understanding for Safe Driving Perception
Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving
Suppress and Rebalance: Towards Generalized Multi-Modal Face Anti-Spoofing
FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation
Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation
Hourglass Tokenizer for Efficient Transformer-Based 3D Human Pose Estimation
UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and UnFavOrable Data Sets
DITTO: Dual and Integrated Latent Topologies for Implicit 3D Reconstruction
Memory-based Adapters for Online 3D Scene Perception
Bayesian Diffusion Models for 3D Shape Reconstruction
Rethinking Few-shot 3D Point Cloud Semantic Segmentation
Extend Your Own Correspondences: Unsupervised Distant Point Cloud Registration by Progressive Distance Extension
Hide in Thicket: Generating Imperceptible and Rational Adversarial Perturbations on 3D Point Clouds
Toward Generalist Anomaly Detection via In-context Residual Learning with Few-shot Sample Prompts
RealNet: A Feature Selection Network with Realistic Synthetic Anomaly for Anomaly Detection
DisCo: Disentangled Control for Realistic Human Dance Generation
Gradient Reweighting: Towards Imbalanced Class-Incremental Learning
TAMM: TriAdapter Multi-Modal Learning for 3D Shape Understanding
Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting
Attentive Illumination Decomposition Model for Multi-Illuminant White Balancing
Misalignment-Robust Frequency Distribution Loss for Image Transformation
3DSFLabelling: Boosting 3D Scene Flow Estimation by Pseudo Auto-labelling
OccTransformer: Improving BEVFormer for 3D camera-only occupancy prediction
UniVS: Unified and Universal Video Segmentation with Prompts as Queries
Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis
Boosting Neural Representations for Videos with a Conditional Decoder
Classes Are Not Equal: An Empirical Study on Image Recognition Fairness
QN-Mixer: A Quasi-Newton MLP-Mixer Model for Sparse-View CT Reconstruction
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
SeMoLi: What Moves Together Belongs Together
Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition
MemoNav: Working Memory Model for Visual Navigation
VideoMAC: Video Masked Autoencoders Meet ConvNets
Theoretically Achieving Continuous Representation of Oriented Bounding Boxes
OHTA: One-shot Hand Avatar via Data-driven Implicit Priors
WWW: A Unified Framework for Explaining What, Where and Why of Neural Networks by Interpretation of Neuron Concepts
Spectral Meets Spatial: Harmonising 3D Shape Matching and Interpolation
SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
ViewFusion: Towards Multi-View Consistency via Interpolated Denoising
OpticalDR: A Deep Optical Imaging Model for Privacy-Protective Depression Recognition
NARUTO: Neural Active Reconstruction from Uncertain Target Observations
Towards Generalizable Tumor Synthesis
Rethinking Multi-domain Generalization with A General Learning Objective
Rethinking Inductive Biases for Surface Normal Estimation
SURE: SUrvey REcipes for building reliable and robust deep networks
Selective-Stereo: Adaptive Frequency Information Selection for Stereo Matching
Deformable One-shot Face Stylization via DINO Semantic Guidance
CustomListener: Text-guided Responsive Interaction for User-friendly Listening Head Generation
NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes
Learning Group Activity Features Through Person Attribute Prediction
Interactive Continual Learning: Fast and Slow Thinking
NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
HUNTER: Unsupervised Human-centric 3D Detection via Transferring Knowledge from Synthetic Instances to Real Scenes
Learning Group Activity Features Through Person Attribute Prediction
Interactive Continual Learning: Fast and Slow Thinking
Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation
DART: Implicit Doppler Tomography for Radar Novel View Synthesis
MeaCap: Memory-Augmented Zero-shot Image Captioning
HMD-Poser: On-Device Real-time Human Motion Tracking from Scalable Sparse Observations
Continual Segmentation with Disentangled Objectness Learning and Class Recognition
HDRFlow: Real-Time HDR Video Reconstruction with Large Motions
LEAD: Learning Decomposition for Source-free Universal Domain Adaptation
F$^3$Loc: Fusion and Filtering for Floorplan Localization
Enhancing Vision-Language Pre-training with Rich Supervisions
Efficient LoFTR: Semi-Dense Local Feature Matching with Sparse-Like Speed
Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning
Learning to Remove Wrinkled Transparent Film with Polarized Prior
LORS: Low-rank Residual Structure for Parameter-Efficient Network Stacking
Active Generalized Category Discovery
MAP: MAsk-Pruning for Source-Free Model Intellectual Property Protection
A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition
Seamless Human Motion Composition with Blended Positional Encodings
DiffusionLight: Light Probes for Free by Painting a Chrome Ball
SplattingAvatar: Realistic Real-Time Human Avatars with Mesh-Embedded Gaussian Splatting
Split to Merge: Unifying Separated Modalities for Unsupervised Domain Adaptation
Real-Time Simulated Avatar from Head-Mounted Sensors
DiaLoc: An Iterative Approach to Embodied Dialog Localization
FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation
EarthLoc: Astronaut Photography Localization by Indexing Earth from Space
CAM Back Again: Large Kernel CNNs from a Weakly Supervised Object Localization Perspective
Distributionally Generative Augmentation for Fair Facial Attribute Classification
Exploiting Style Latent Flows for Generalizing Deepfake Detection Video Detection
MoST: Motion Style Transformer between Diverse Action Contents
Coherent Temporal Synthesis for Incremental Action Segmentation
Is Vanilla MLP in Neural Radiance Field Enough for Few-shot View Synthesis?
LTGC: Long-tail Recognition via Leveraging LLMs-driven Generated Content
PeerAiD: Improving Adversarial Distillation from a Specialized Peer Tutor
SNIFFER: Multimodal Large Language Model for Explainable Out-of-Context Misinformation Detection
Multi-Task Dense Prediction via Mixture of Low-Rank Experts
Beyond Text: Frozen Large Language Models in Visual Signal Comprehension
Dynamic Graph Representation with Knowledge-aware Attention for Histopathology Whole Slide Image Analysis
Robust Synthetic-to-Real Transfer for Stereo Matching
CuVLER: Enhanced Unsupervised Object Discoveries through Exhaustive Self-Supervised Transformers
Masked AutoDecoder is Effective Multi-Task Vision Generalist
PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution
Unleashing Network Potentials for Semantic Scene Completion
Open-World Semantic Segmentation Including Class Similarity
ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions
FSC: Few-point Shape Completion
Frequency Decoupling for Motion Magnification via Multi-Level Isomorphic Architecture
A Bayesian Approach to OOD Robustness in Image Classification