Open amusi opened 4 months ago
Domain: OCR Paper name/title: Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors Paper link: https://arxiv.org/pdf/2312.05286 Code link: https://github.com/SJTU-DeepVisionLab/FreeReal
Paper title: milliFlow: Scene Flow Estimation on mmWave Radar Point Cloud for Human Motion Sensing Paper link: https://arxiv.org/abs/2306.17010 Code link: https://github.com/Toytiny/milliFlow/
Domain: MLLM Paper title: ControlCap: Controllable Region-level Captioning Paper link: https://arxiv.org/abs/2401.17910 Code link: https://github.com/callsys/ControlCap
Paper name: DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries Project link: https://zhang-tao-whu.github.io/projects/DVIS_DAQ/ Paper link: https://arxiv.org/abs/2404.00086 Code link: https://github.com/zhang-tao-whu/DVIS_Plus Features: New SOTA on YTVIS19, YTVIS21 and OVIS datasets.
Paper name/title: 3D Small Object Detection with Dynamic Spatial Pruning Project link: https://xuxw98.github.io/DSPDet3D/ Paper link: https://arxiv.org/abs/2305.03716 Code link: https://github.com/xuxw98/DSPDet3D
field: Image Generation + Diffusion Models Paper name/title: Object-Conditioned Energy-Based Attention Map Alignment in Text-to-Image Diffusion Models Paper link: https://arxiv.org/abs/2404.07389 Code link: https://github.com/YasminZhang/EBAMA
[Medical Image, Medical Image Segmentation] Paper title: Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging Paper link: https://arxiv.org/abs/2311.16914 Code link: https://github.com/peirong26/Brain-ID
[Medical Image, Medical Image Segmentation] Paper name/title: ScribblePrompt: Fast and Flexible Interactive Segmentation for Any Biomedical Image Project link: https://scribbleprompt.csail.mit.edu/ Paper link: https://arxiv.org/abs/2312.07381 Code link: https://github.com/halleewong/ScribblePrompt
[Video Generation] Paper title: VideoStudio: Generating Consistent-Content and Multi-Scene Videos Project link: https://vidstudio.github.io/ Code link: https://github.com/FuchenUSTC/VideoStudio
Paper title: 4D Contrastive Superflows are Dense 3D Representation Learners Paper link: https://arxiv.org/abs/2407.06190 Code link: https://github.com/Xiangxu-0103/SuperFlow
[Low level vision] Paper title: Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization Project link: https://kaminyou.com/Dense-Normalization/ Paper link: https://arxiv.org/abs/2407.04245 Code link: https://github.com/Kaminyou/Dense-Normalization
[3D Visual Grounding] Paper title: Multi-branch Collaborative Learning Network for 3D Visual Grounding Paper link: https://arxiv.org/abs/2407.05363v2 Code link: https://github.com/qzp2018/MCLN
[NeRF + Vision Transformers + Self-Supervised Learning] Paper name/title: NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields Project link: https://nerf-mae.github.io/ Paper link: https://arxiv.org/pdf/2404.01300 Code link: https://github.com/zubair-irshad/NeRF-MAE
Domain: OCR paper title: PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer paper link: https://arxiv.org/abs/2407.07764 code link: https://github.com/SJTU-DeepVisionLab/PosFormer
Domain: 3D Object Detection Paper name/title: Ray Denoising: Depth-aware Hard Negative Sampling for Multi-view 3D Object Detection Paper link: https://arxiv.org/abs/2402.03634 Code link: https://github.com/LiewFeng/RayDN
Domain: low-level Image Compression paper title: Image Compression for Machine and Human Vision With Spatial-Frequency Adaptation code link: https://github.com/qingshi9974/ECCV2024-AdpatICMH paper link: http://arxiv.org/abs/2407.09853
Domain: Interpretable-by-Design Models, Unsupervised Part Discovery Paper Title: PDiscoFormer: Relaxing Part Discovery Constraints with Vision Transformers Code Link: https://github.com/ananthu-aniraj/pdiscoformer Paper Link: https://arxiv.org/abs/2407.04538
Paper name/title: C2C: Component-to-Composition Learning for Zero-Shot Compositional Action Recognition Paper link: https://arxiv.org/abs/2407.06113 Code link: https://github.com/RongchangLi/ZSCAR_C2C
Paper name/title: AnatoMask: Enhancing Medical Image Segmentation with Reconstruction-guided Self-masking Paper link: https://arxiv.org/abs/2407.06468 Code link: https://github.com/ricklisz/AnatoMask
Domain: Object Detection, DETR Paper name/title: Relation DETR: Exploring Explicit Position Relation Prior for Object Detection Paper link: https://arxiv.org/abs/2407.11699v1 Code link: https://github.com/xiuqhou/Relation-DETR Dataset link: https://huggingface.co/datasets/xiuqhou/SA-Det-100k
Domain: Medical Imaging, Fairness learning Paper name/title: FairDomain: Achieving Fairness in Cross-Domain Medical Image Segmentation and Classification Project link: https://ophai.hms.harvard.edu/datasets/harvard-fairdomain20k Paper link: https://arxiv.org/abs/2407.08813 Dataset link: https://drive.google.com/drive/u/1/folders/1huH93JVeXMj9rK6p1OZRub868vv0UK0O Code link: https://github.com/Harvard-Ophthalmology-AI-Lab/FairDomain
Domain: 视觉和语言(Vision-Language), 视频理解(Video Understanding), Zero-Shot Learning(零样本学习)
Paper name/title:SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders Project link: N/A Paper link: https://arxiv.org/abs/2407.13460 Code link: https://github.com/pha123661/SA-DVAE
Thanks for your amazing work!
ZIGMA: A DiT-style Zigzag Mamba Diffusion Model
Code: https://taohu.me/zigma/
[Diffusion Models] Paper name/title: Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation Paper link: https://arxiv.org/abs/2403.16394 Code link: https://github.com/zdxdsw/skewed_relations_T2I
Domain: Object Detection Paper name/title: Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector Project link: http://yuqianfu.com/CDFSOD-benchmark/ Paper link: https://arxiv.org/pdf/2402.03094 Code link: https://github.com/lovelyqian/CDFSOD-benchmark
Semantic Segmentation/ Medical Image Segmentation Paper name/title: Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures Paper link: https://arxiv.org/abs/2407.14754 Code link: https://github.com/cbmi-group/FFM-Multi-Decoder-Network
3D Registration / Visual Localization Paper Name: SPVLoc: Semantic Panoramic Viewport Matching for 6D Camera Localization in Unseen Environments Paper Link: https://arxiv.org/abs/2404.10527 Code link: https://github.com/fraunhoferhhi/spvloc Project Link: https://fraunhoferhhi.github.io/spvloc/
Domain: Low-level Vision Paper name/title: OneRestore: A Universal Restoration Framework for Composite Degradation Project link: https://gy65896.github.io/projects/ECCV2024_OneRestore Paper link: https://arxiv.org/abs/2407.04621 Code link: https://github.com/gy65896/OneRestore
Domain: Object Counting Paper name/title: Zero-shot Object Counting with Good Exemplars Paper link: https://arxiv.org/abs/2407.04948 Code link: https://github.com/HopooLinZ/VA-Count
Domain: real-time rendering / glossy object modeling Paper name/title:REFRAME: Reflective Surface Real-Time Rendering for Mobile Devices Project link: https://xdimlab.github.io/REFRAME/ Paper link: https://arxiv.org/abs/2403.16481 Code link: https://github.com/MARVELOUSJI/REFRAME
Domain: Diffusion Model Paper name/title: The Lottery Ticket Hypothesis in Denoising: Towards Semantic-Driven Initialization Project link:https://ut-mao.github.io/noise.github.io/ Paper link:https://arxiv.org/abs/2312.08872 Code link:https://github.com/UT-Mao/Initial-Noise-Construction
Domain: Low-level Vision Paper name: Image Demoireing in RAW and sRGB Domains Paper link: https://arxiv.org/abs/2312.09063 Code link: https://github.com/rebeccaeexu/RRID
Domain: 3D Semantic Segmentation Paper name/title: SFPNet: Sparse Focal Point Network for Semantic Segmentation on General LiDAR Point Clouds Dataset link: https://www.semanticindustry.top Paper link: https://arxiv.org/abs/2407.11569 Code link: https://github.com/Cavendish518/SFPNet
Domain: Image Generation, Diffusion Model Paper title: Enriching Information and Preserving Semantic Consistency in Expanding Curvilinear Object Segmentation Demo link: https://huggingface.co/spaces/QinLei086/Curvilinear_Object_Generation_by_Text_and_Segmap Dataset link: https://huggingface.co/datasets/QinLei086/COSTG_v1 Paper link: https://arxiv.org/abs/2407.08209 Code link: https://github.com/tanlei0/COSTG
Domain: Video Generation, Diffusion Model, Dataset Paper Title: Audio-Synchronized Visual Animation (Oral) Project Link: https://lzhangbj.github.io/projects/asva/asva.html Demo Link: https://huggingface.co/spaces/Linz99/ASVA Paper Link: https://arxiv.org/abs/2403.05659 Code Link: https://github.com/lzhangbj/ASVA
Domain: Action Recognition + Medical Imaging Paper title: VSViG: Real-time Video-based Seizure Detection via Skeleton-based Spatiotemporal ViG Paper link: https://arxiv.org/pdf/2311.14775 Code link: https://github.com/xuyankun/VSViG
Domain: CLIP Adaptation, Disentanglement, Zero-Shot learning Paper name/title: CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts Paper link: https://arxiv.org/abs/2311.16445 Code link: https://github.com/YichaoCai1/CLAP
[3DGS(Gaussian Splatting)] Paper title: Pixel-GS: Density Control with Pixel-aware Gradient for 3D Gaussian Splatting Project link: https://pixelgs.github.io/ Paper link: https://arxiv.org/abs/2403.15530 Code link: https://github.com/zhengzhang01/Pixel-GS
Domain: Image Inpainting, Diffusion Models Paper name/title: D^4-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On Paper link: https://arxiv.org/abs/2407.15111 Code link: https://github.com/Jerome-Young/D4-VTON
Domain: Object Detection, Reliability Paper name/title: On Calibration of Object Detectors: Pitfalls, Evaluation and Baselines Paper link: https://arxiv.org/abs/2405.20459 Code link: https://github.com/fiveai/detection_calibration
@amusi
Domain: Image Inpainting, Diffusion Models Paper name/title: A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting Paper link: https://arxiv.org/abs/2312.03594 Code link: https://github.com/open-mmlab/PowerPaint
Domain: 3DGS, 3D Generation Paper name/title: DreamScene: 3D Gaussian-based Text-to-3D Scene Generation via Formation Pattern Sampling Paper link: https://arxiv.org/abs/2404.03575 Project link: https://dreamscene-project.github.io/ Code link: https://github.com/DreamScene-Project/DreamScene
Domain: 3D Object Tracking Paper name/title: JDT3D: Addressing the Gaps in LiDAR-Based Tracking-by-Attention Paper link: https://arxiv.org/abs/2407.04926 Code link: https://github.com/TRAILab/JDT3D
Domain: Low-level Vision Paper title: Object-Aware NIR-to-Visible Translation Paper link: https://www.ecva.net/papers/eccv_2024/papers_ECCV/papers/03338.pdf Code link: https://github.com/Yiiclass/Sherry Dataset: https://1drv.ms/f/c/e976acca7b9fcd1f/EiDybm6th_dCmf7v0HDM-hYBjuHcOsVkjCa2067pgzaUxQ?e=eVisVX
Domain: Video Generation Paper title: PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Project link: https://stevenlsw.github.io/physgen/ Paper link: https://arxiv.org/abs/2409.18964 Code link: https://github.com/stevenlsw/physgen
Domain: 3D Gaussian Splatting, 3D Registration Paper title: GaussReg: Fast 3D Registration with Gaussian Splatting Project link: https://jiahao620.github.io/gaussreg/ Paper link: https://arxiv.org/abs/2407.05254 Code link: https://github.com/GAP-LAB-CUHK-SZ/GaussReg
[The format of the issue] Paper name/title: Project link: Paper link: Code link: