New Study : https://www.notion.so/Reading-Papers-Deep-Learning-504b50ddaed14360b34dfd6d49cb3455
Update 2024.01.09
Paper Review
- 개인 공부라 열심히는 하고 있으나, 완벽한 리뷰가 아닙니다.
- 리뷰가 끝나더라도 계속 의문/생각/교정/좋은자료가 있다면 꾸준히 업데이트 됩니다.
- link review는 다른 분들이 하신 좋은 리뷰를 링크한 것입니다.
- light_link는 빠르게 개념(abstract)정도로 본 논문을 의미.
- 현재 상황이 리뷰 공개를 못하고 있는 상황이라, 논문 링크로만 정리진행합니다.
Virtual Try On [Link]
Asymmetric Image Retrieval [Link]
Deep Learning
Multi-Label Image Recognition
- Learning Discriminative Representations for Multi-Label Image Recognition : [paper]
Knowledge distillation
- Knowledge distillation: A good teacher is patient and consistent : [paper]
- Hierarchical Self-supervised Augmented Knowledge Distillation : [paper]
- Text is Text, No Matter What: Unifying Text Recognition using Knowledge Distillation : [paper]
Vision and Language Pre-trained [Link]
CLIP & joined multi-modal
Efficient training & trick
Imbalance Datasets
Self Supervised Learninig & unsupervised learning & semi/Weakly supervised learning
- Unsupervised Representation Learning by Predicting Image Rotations : [paper][]
- Unsupervised Visual Representation Learning by Context Prediction : [paper][]
- Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles : [paper][]
- Discriminative Unsupervised Feature Learning with Exemplar Convolutional Neural Networks : [paper][]
- Rethinking Pre-training and Self-training : [paper][]
- Selfie: Self-supervised Pretraining for Image Embedding : [paper] [light_review]
- Self-training with Noisy Student improves ImageNet classification : [paper] [review]
- SimCLR : A Simple Framework for Contrastive Learning of Visual Representations : [paper] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review]
- SimCLR V2:Big Self-Supervised Models are Strong Semi-Supervised Learners : [paper]
- MoCo : Momentum Contrast for Unsupervised Visual Representation Learning : [paper]
- MoCo V2 : Improved Baselines with Momentum Contrastive Learning : [paper] [link_review] [link_review]
- MoCo V3 : An Empirical Study of Training Self-Supervised Vision Transformers: [paper] [link_review] [link_review]
- BYOL : Bootstrap your own latent: A new approach to self-supervised Learning: [paper]
- Exploring the limits of weakly supervised pretraining : [paper]
- Triplet is All You Need with Random Mappings for Unsupervised Visual Representation Learning : [paper]
- ScatSimCLR: self-supervised contrastive learning with pretext task regularization for small-scale datasets : [paper]
Self Supervised Training + Mask based Token + Transformer
- MST: Masked Self-Supervised Transformer for Visual Representation : [paper]
- Masked Autoencoders Are Scalable Vision Learners : [paper]
- SimMIM: A Simple Framework for Masked Image Modeling : [paper]
Self Supervised Training + Instance Image Retrival
- InsCLR: Improving Instance Retrieval with Self-Supervision : [paper]
Vision Transformers classification
- Stand-Alone Self-Attention in Vision Models : [paper][review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review]
- Selfie: Self-supervised Pretraining for Image Embedding : [paper] [light_review] [link_review] [link_review]
- ViT:An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale: [paper] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review] [link_review]
- DeiT:Training data-efficient image transformers & distillation through attention : [paper] [link_review] [link_review] [link_review] [link_review]
- Bottleneck Transformers for Visual Recognition: [paper] [link_review]
- Going deeper with Image Transformers: [paper]
- Rethinking Spatial Dimensions of Vision Transformers : [paper]
- On the Adversarial Robustness of Visual Transformers: [paper]
- TransFG: A Transformer Architecture for Fine-grained Recognition : [paper]
- Understanding Robustness of Transformers for Image Classification : [paper]
- DeepViT: Towards Deeper Vision Transformer : [paper]
- CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification : [paper]
- CvT: Introducing Convolutions to Vision Transformers: [paper] [link_review]
- Efficient Feature Transformations for Discriminative and Generative Continual Learning : [paper]
- Swin Transformer: Hierarchical Vision Transformer using Shifted Windows : [paper] [link_review] [link_review] [link_review]
- Can Vision Transformers Learn without Natural Images?: [paper]
- Scaling Local Self-Attention for Parameter Efficient Visual Backbones: [paper]
- Incorporating Convolution Designs into Visual Transformers : [paper]
- ConViT: Improving Vision Transformers with Soft Convolutional Inductive Biases : [paper]
- Explicitly Modeled Attention Maps for Image Classification : [paper]
- Conditional Positional Encodings for Vision Transformers : [paper]
- Transformer in Transformer: [paper] [link_review]
- A Survey on Visual Transformer: [paper]
- Co-Scale Conv-Attentional Image Transformers: [paper]
- Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity : [paper] [link_review]
- LocalViT: Bringing Locality to Vision Transformers : [paper]
- Visformer: The Vision-friendly Transformer : [paper]
- Multiscale Vision Transformers : [paper] [link_review] [link_review]
- So-ViT: Mind Visual Tokens for Vision Transformer: [paper]
- Token Labeling: Training a 85.4% Top-1 Accuracy Vision Transformer with 56M Parameters on ImageNet (이후 "All Tokens Matter: Token Labeling for Training Better Vision Transformers 변경"): [paper]
- Fourier Image Transformer: [paper]
- Emerging Properties in Self-Supervised Vision Transformers: [paper]
- ConTNet: Why not use convolution and transformer at the same time?: [paper]
- Twins: Revisiting Spatial Attention Design in Vision Transformers: [paper]
- MoCo V3 :An Empirical Study of Training Self-Supervised Vision Transformers: [paper] [link_review] [link_review]
- Conformer: Local Features Coupling Global Representations for Visual Recognition: [paper]
- Self-Supervised Learning with Swin Transformers: [paper]
- Are Pre-trained Convolutions Better than Pre-trained Transformers?: [paper]
- LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference: [paper]
- Are Convolutional Neural Networks or Transformers more like human vision?: [paper]
- Rethinking Skip Connection with Layer Normalization in Transformers and ResNets: [paper]
- Rethinking the Design Principles of Robust Vision Transformer (Towards Robust Vision Transformer): [paper]
- Longformer: The Long-Document Transformer : [paper] [link_review] [link_review] [link_review] [link_review]
- Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding: [paper]
- On the Robustness of Vision Transformers to Adversarial Examples: [paper]
- Refiner: Refining Self-attention for Vision Transformers: [paper]
- Patch Slimming for Efficient Vision Transformers: [paper]
- RegionViT: Regional-to-Local Attention for Vision Transformers: [paper]
- X-volution: On the unification of convolution and self-attention: [paper]
- The Image Local Autoregressive Transformer: [paper]
- Glance-and-Gaze Vision Transformer: [paper]
- Semantic Correspondence with Transformers: [paper]
- DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification: [paper]
- When Vision Transformers Outperform ResNets without Pretraining or Strong Data Augmentations: [paper] [link_review]
- KVT: k-NN Attention for Boosting Vision Transformers: [paper]
- Less is More: Pay Less Attention in Vision Transformers: [paper]
- FoveaTer: Foveated Transformer for Image Classification: [paper]
- An Attention Free Transformer: [paper]
- Not All Images are Worth 16x16 Words: Dynamic Vision Transformers with Adaptive Sequence Length: [paper]
- Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks: [paper] [link_review]
- Pre-Trained Image Processing Transformer: [paper] [link_review]
- ResT: An Efficient Transformer for Visual Recognition: [paper]
- Towards Robust Vision Transformer: [paper]
- Aggregating Nested Transformers: [paper]
- GasHis-Transformer: A Multi-scale Visual Transformer Approach for Gastric Histopathology Image Classification: [paper]
- Intriguing Properties of Vision Transformers: [paper] [link_review] [link_review] [link_review]
- Vision Transformers are Robust Learners: [paper]
- Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer: [paper]
- A Survey of Transformers: [paper]
- Armour: Generalizable Compact Self-Attention for Vision Transformers : [paper]
- Evo-ViT: Slow-Fast Token Evolution for Dynamic Vision Transformer : [paper]
- Dual-stream Network for Visual Recognition : [paper]
- BEiT: BERT Pre-Training of Image Transformers : [paper]
- Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions : [paper]
- PVTv2: Improved Baselines with Pyramid Vision Transformer : [paper]
- Thinking Like Transformers : [paper]
- CMT: Convolutional Neural Networks Meet Vision Transformers : [paper] [link_review] [link_review]
- Transformer with Peak Suppression and Knowledge Guidance for Fine-grained Image Recognition : [paper]
- ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias : [paper]
- Visual Transformer Pruning : [paper]
- Local-to-Global Self-Attention in Vision Transformers : [paper]
- Feature Fusion Vision Transformer for Fine-Grained Visual Categorization : [paper]
- Vision Xformers: Efficient Attention for Image Classification : [paper]
- EsViT : Efficient Self-supervised Vision Transformers for Representation Learning : [paper]
- GLiT: Neural Architecture Search for Global and Local Image Transformer : [paper]
- Efficient Vision Transformers via Fine-Grained Manifold Distillation : [paper]
- What Makes for Hierarchical Vision Transformer? : [paper]
- AutoFormer: Searching Transformers for Visual Recognition : [paper]
- Focal Self-attention for Local-Global Interactions in Vision Transformers : [paper] [link_review]
- ConvNets vs. Transformers: Whose Visual Representations are More Transferable? : [paper]
- Demystifying Local Vision Transformer: Sparse Connectivity, Weight Sharing, and Dynamic Weight : [paper]
- Mobile-Former: Bridging MobileNet and Transformer : [paper]
- Image Fusion Transformer : [paper]
- PSViT: Better Vision Transformer via Token Pooling and Attention Sharing : [paper]
- Do Vision Transformers See Like Convolutional Neural Networks? : [paper]
- Linformer: Self-Attention with Linear Complexity : [paper] [link_review]
- CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows : [paper]
- How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers : [paper]
- Searching for Efficient Multi-Stage Vision Transformers : [paper]
- Exploring and Improving Mobile Level Vision Transformers : [paper]
- XCiT: Cross-Covariance Image Transformers : [paper]
- Efficient Vision Transformers via Fine-Grained Manifold Distillation : [paper]
- Scaled ReLU Matters for Training Vision Transformers : [paper]
- VOLO: Vision Outlooker for Visual Recognition : [paper]
- CoAtNet: Marrying Convolution and Attention for All Data Sizes : [paper] [link_review] [link_review]
- MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer : [paper]
- A free lunch from ViT: Adaptive Attention Multi-scale Fusion Transformer for Fine-grained Visual Recognition : [paper]
- Improved Multiscale Vision Transformers for Classification and Detection : [paper]
Vision Transformers positional embedding
- Self-Attention with Relative Position Representations : [paper] [link_review]
- Vision Transformer with Progressive Sampling : [paper]
- DPT: Deformable Patch-based Transformer for Visual Recognition : [paper]
- CAPE: Encoding Relative Positions with Continuous Augmented Positional Embeddings : [paper]
- Rethinking and Improving Relative Position Encoding for Vision Transformer : [paper]
- Rethinking Positional Encoding : [paper]
- Relative Positional Encoding for Transformers with Linear Complexity : [paper]
- Conditional Positional Encodings for Vision Transformers : [paper]
- Pyramid Adversarial Training Improves ViT Performance : [paper]
- Shunted Self-Attention via Multi-Scale Token Aggregation : [paper]
- AdaViT: Adaptive Vision Transformers for Efficient Image Recognition : [paper]
- ATS: Adaptive Token Sampling For Efficient Vision Transformers : [paper]
- Global Interaction Modelling in Vision Transformer via Super Tokens : [paper]
Vision Transformers vs MLP (or Others)
- AS-MLP: An Axial Shifted MLP Architecture for Vision : [paper]
- S2-MLPv2: Improved Spatial-Shift MLP Architecture for Vision : [paper]
- ResMLP: Feedforward networks for image classification with data-efficient training: [paper]
- Pay Attention to MLPs: [paper]
- Do You Even Need Attention? A Stack of Feed-Forward Layers Does Surprisingly Well on ImageNet: [paper]
- MLP-Mixer: An all-MLP Architecture for Vision : [paper]
- Sparse-MLP: A Fully-MLP Architecture with Conditional Computation : [paper]
- ConvMLP: Hierarchical Convolutional MLPs for Vision : [paper]
- Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition : [paper]
- MetaFormer is Actually What You Need for Vision : [paper]
- Adaptive Fourier Neural Operators: Efficient Token Mixers for Transformers : [paper] *
Vision Transformers retrieval
- Investigating the Vision Transformer Model for Image Retrieval Tasks: [paper]
- Training Vision Transformers for Image Retrieval: [paper]
- Instance-level Image Retrieval using Reranking Transformers: [paper]
- Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval: [paper]
- TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval : [paper]
- Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations : [paper]
- Vision Transformer Hashing for Image Retrieval : [paper]
Vision Transformers segmentation and detection
- CoSformer: Detecting Co-Salient Object with Transformers: [paper]
- MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding: [paper]
- Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks: [paper]
- Medical Image Segmentation Using Squeeze-and-Expansion Transformers: [paper]
- SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers: [paper]
- Visual Transformers: Token-based Image Representation and Processing for Computer Vision : [paper][]
- DETR:End-to-End Object Detection with Transformers : [paper] [link_review] [link_review] [link_review] [link_review] [link_review]
- Unifying Global-Local Representations in Salient Object Detection with Transformer : [paper]
- A Unified Efficient Pyramid Transformer for Semantic Segmentation : [paper]
- Dual-stream Network for Visual Recognition : [paper]
- MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers : [paper]
- Vision Transformers with Patch Diversification : [paper]
- Improve Vision Transformers Training by Suppressing Over-smoothing : [paper]
- SOTR: Segmenting Objects with Transformers : [paper]
- Simpler is Better: Few-shot Semantic Segmentation with Classifier Weight Transformer : [paper]
- Polyp-PVT: Polyp Segmentation with Pyramid Vision Transformers : [paper]
- Unifying Global-Local Representations in Salient Object Detection with Transformer : [paper]
- Conditional DETR for Fast Training Convergence : [paper]
- Fully Transformer Networks for Semantic Image Segmentation : [paper]
- Segmenter: Transformer for Semantic Segmentation : [paper]
- nnFormer: Interleaved Transformer for Volumetric Segmentation : [paper]
- Benchmarking Detection Transfer Learning with Vision Transformers : [paper]
Vision Transformers video
- An Image is Worth 16x16 Words, What is a Video Worth?: [paper]
- Token Shift Transformer for Video Classification : [paper]
Vision Transformers face
- Robust Facial Expression Recognition with Convolutional Visual Transformers : [paper]
- Learning Vision Transformer with Squeeze and Excitation for Facial Expression Recognition : [paper]
Vision Transformers OCR
- NRTR: A No-Recurrence Sequence-to-Sequence Model For Scene Text Recognition : [paper][]
- On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention : [paper][]
- 2D Attentional Irregular Scene Text Recognizer : [paper][]
Vision Transformers multi-modal
- ReFormer: The Relational Transformer for Image Captioning : [paper]
- Long-Short Transformer: Efficient Transformers for Language and Vision : [paper]
Vision Transformers GAN
- A Hierarchical Transformation-Discriminating Generative Model for Few Shot Anomaly Detection : [paper]
- ViTGAN: Training GANs with Vision Transformers : [paper]
- Styleformer: Transformer based Generative Adversarial Networks with Style Vector : [paper]
- Combining Transformer Generators with Convolutional Discriminators : [paper]
Facebook AI Image Similarity Challenge
- 3rd Place: A Global and Local Dual Retrieval Solution to Facebook AI Image Similarity Challenge : [paper]
Google Landmark Challenge
Image Retrieval (Instance level Image Retrieval) & Deep Feature
- (My paper) All the attention you need: Global-local, spatial-channel attention for image retrieval : [paper]
- Large-Scale Image Retrieval with Attentive Deep Local Features : [paper] [review]
- NetVLAD: CNN architecture for weakly supervised place recognition : [paper][review]
- Learning visual similarity for product design with convolutional neural networks : [paper][review]
- Bags of Local Convolutional Features for Scalable Instance Search : [paper][review]
- Neural Codes for Image Retrieval : [paper][review]
- Conditional Similarity Networks : [paper][review]
- End-to-end Learning of Deep Visual Representations for Image Retrieval : [paper][review]
- CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples : [paper][review]
- Image similarity using Deep CNN and Curriculum Learning : [paper][review]
- Faster R-CNN Features for Instance Search : [paper][review]
- Regional Attention Based Deep Feature for Image Retrieval : [paper][review]
- Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination : [paper][review]
- Object retrieval with deep convolutional features : [paper][review]
- Cross-dimensional Weighting for Aggregated Deep Convolutional Features : [paper][review]
- Learning Embeddings for Product Visual Search with Triplet Loss and Online Sampling : [paper][review]
- Saliency Weighted Convolutional Features for Instance Search : [paper][review]
- 2018 Google Landmark Retrieval Challenge 리뷰 : [review]
- 2019 Google Landmark Retrieval Challenge 리뷰 : [review]
- REMAP: Multi-layer entropy-guided pooling of dense CNN features for image retrieval : [paper][review]
- Large-scale Landmark Retrieval/Recognition under a Noisy and Diverse Dataset : [paper][review]
- Fine-tuning CNN Image Retrieval with No Human Annotation : [paper][review]
- Large Scale Landmark Recognition via Deep Metric Learning : [paper][review]
- Deep Aggregation of Regional Convolutional Activations for Content Based Image Retrieval : [paper][review]
- Challenging deep image descriptors for retrieval in heterogeneous iconographic collections : [paper][review]
- A Benchmark on Tricks for Large-scale Image Retrieval : [paper][review]
- Attention-Aware Generalized Mean Pooling for Image Retrieval : [paper][review]
- Class-Weighted Convolutional Features for Image Retrieval : [paper][review] # 100th
- deep image retrieval loss (계속 업데이트):[paper][review]
- Matchable Image Retrieval by Learning from Surface Reconstruction:[paper][review]
- Combination of Multiple Global Descriptors for Image Retrieval:[paper][review]
- Unifying Deep Local and Global Features for Efficient Image Search:[paper][review]
- ACTNET: end-to-end learning of feature activations and multi-stream aggregation for effective instance image retrieval:[paper][review]
- Google Landmarks Dataset v2 A Large-Scale Benchmark for Instance-Level Recognition and Retrieval:[paper][review]
- Detect-to-Retrieve: Efficient Regional Aggregation for Image Search:[paper][review]
- Local Features and Visual Words Emerge in Activations:[paper][review]
- Image Retrieval using Multi-scale CNN Features Pooling: [paper][review]
- MultiGrain: a unified image embedding for classes and instances: [paper][link_review] [link_review]
- Divide and Conquer the Embedding Space for Metric Learning: [paper][link_review]
- An Effective Pipeline for a Real-world Clothes Retrieval System: [paper][light_review]
- Instance Similarity Learning for Unsupervised Feature Representation : [paper]
- Towards Accurate Localization by Instance Search : [paper]
- The 2021 Image Similarity Dataset and Challenge : [paper]
- DOLG:Single-Stage Image Retrieval with Deep Orthogonal Fusion of Local and Global Features : [paper]
- Towards A Fairer Landmark Recognition Dataset : [paper]
- Recall@k Surrogate Loss with Large Batches and Similarity Mixup : [paper]
Metric Learning
Fashion Image Retrieval
- Learning Embeddings for Product Visual Search with Triplet Loss and Online Sampling : [paper][review]
- Conditional Similarity Networks : [paper][review]
- Semi-supervised Feature-Level Attribute Manipulation for Fashion Image Retrieval : [paper][link_review]
Fashion Compatibility & Outfit Recommendation
Personalized Outfit Recommendation & fashion outfit
- FashionNet: Personalized Outfit Recommendation with Deep Neural Network: [paper][review]
- Self-supervised Visual Attribute Learning for Fashion Compatibility : [paper]
- Personalized Outfit Recommendation with Learnable Anchors : [paper]
- PAI-BPR: Personalized Outfit Recommendation Scheme with Attribute-wise Interpretability : [paper]
- Hierarchical Fashion Graph Network for Personalized Outfit Recommendation : [paper]
Fashion multi-modal
- Kaleido-BERT: Vision-Language Pre-training on Fashion Domain : [paper]
- Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback : [paper]
Fashion DataSets
- SHIFT15M: Multiobjective Large-Scale Fashion Dataset with Distributional Shifts : [paper]
Retail & Product & Instance
- Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretrainingr : [paper]
- RP2K: A Large-Scale Retail Product Dataset for Fine-Grained Image Classification : [paper]
- eProduct: A Million-Scale Visual Search Benchmark to Address Product Recognition Challenges : [paper]
- Regional Maximum Activations of Convolutions with Attention for Cross-domain Beauty and Personal Care Product Retrieval:[paper][review]
- Learning visual similarity for product design with convolutional neural networks : [paper][review]
- Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretrainingr : [paper]
- he Met Dataset: Instance-level Recognition for Artworks : [paper]
Image Retrieval using Deep Hash
- Deep Learning of Binary Hash Codes for Fast Image Retrieval : [paper][review]
- Feature Learning based Deep Supervised Hashing with Pairwise Labels : [paper][review]
- Deep Supervised Hashing with Triplet Labels : [paper][review]
- Online Hashing with Similarity Learning : [paper]
Video Classification
- NetVLAD: CNN architecture for weakly supervised place recognition : [paper][review]
- Learnable pooling with Context Gating for video classification : [paper][review]
- Less is More: Learning Highlight Detection from Video Duration : [paper][review]
- Efficient Video Classification Using Fewer Frames : [paper][review]
OCR - Recognition
- Synthetically Supervised Feature Learning for Scene Text Recognition : [paper][review]
- FOTS: Fast Oriented Text Spotting with a Unified Network : [paper][review]
- Robust Scene Text Recognition with Automatic Rectification : [paper][review]
- Joint Visual Semantic Reasoning: Multi-Stage Decoder for Text Recognition : [paper]
OCR - Detection
Attention & Deformation
Visual & Textual Embedding
CNN
Transfer Learning
Generative Adversarial Nets
- Generative Adversarial Nets : [paper][review]
- Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks : [paper][review]
- Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks : [paper][review]
- Progressive Growing of GANs for Improved Quality, Stability, and Variation : [paper][review]
- Beholder-GAN: Generation and Beautification of Facial Images with Conditioning on Their Beauty Level : [paper][review]
- Synthetically Supervised Feature Learning for Scene Text Recognition : [paper][review]
- A Style-Based Generator Architecture for Generative Adversarial Networks : [paper][review]
- High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs : [paper][review]
- Everybody Dance Now : [paper][review]
- Be Your Own Prada: Fashion Synthesis with Structural Coherence : [paper][review]
- Fashion-Gen: The Generative Fashion Dataset and Challenge : [paper][review]
- StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks : [paper][review]
- DwNet: Dense warp-based network for pose-guided human video generation: [paper][review]
Face
- FaceNet: A Unified Embedding for Face Recognition and Clustering : [paper][review]
- The Devil of Face Recognition is in the Noise : [paper][link_review]
- Revisiting a single-stage method for face detection : [paper][review
- MixFaceNets: Extremely Efficient Face Recognition Networks : [paper]
Pose Estimation
NLP/NLU
- Efficient Estimation of Word Representations in Vector Space : [paper][review]
- node2vec: Scalable Feature Learning for Networks : [paper][review]
- Transfomer(self attention) 기본 이해하기 : PPT정리
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding : [paper][review](~ing)
- DeepRank: A New Deep Architecture for Relevance Ranking in Information Retrieval : [paper][review]
- SNRM: From Neural Re-Ranking to Neural Ranking: Learning a Sparse Representation for Inverted Indexing : [paper][review]
- TF-Ranking: Scalable TensorFlow Library for Learning-to-Rank : [paper][review]
- ConvRankNet: Deep Neural Network for Learning to Rank Query-Text Pairs : [paper][review]
- KNRM: End-to-End Neural Ad-hoc Ranking with Kernel Pooling : [paper][review]
- Conv-KNRM: Convolutional Neural Networks for Soft-Matching N-Grams in Ad-hoc Search : [paper][review]
- PACRR: A position-aware neural IR model for relevance matching : [paper][link_review]
- CEDR: Contextualized Embeddings for Document Ranking #262 : [paper][link]
- Deeper Text Understanding for IR with Contextual Neural Language Modeling : [paper][link]
- Simple Applications of BERT for Ad Hoc Document Retrieval : [paper][link]
- Document Expansion by Query Prediction : [paper][link]
- Passage Re-ranking with BERT : [paper][link]
Domain Adaptation
Curriculum Learning
- CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images : [paper][review]
Image Segmentation
Localization
AutoML
Image Quality
- Learning to Compose with Professional Photographs on the Web : [paper][review]
- Photo Aesthetics Ranking Network with Attributes and Content Adaptation : [paper][review]
- Composition-preserving Deep Photo Aesthetics Assessment : [paper][review]
- Deep Image Aesthetics Classification using Inception Modules and Fine-tuning Connected Layer : [paper][review]
- NIMA: Neural Image Assessment : [paper][review]
Others