Overview-of-Non-autoregressive-Applications
This repo presents an overview of Non-autoregressive (NAR)] models, including links to related papers and corresponding codes.
NAR models aim to speed up decoding and reduce the inference latency, then realize better industry application. However, this improvement of speed comes at the expense of the decline of quality. Many methods and tricks are proposed to reduce this gap.
NAR models are first proposed for neural machine translation, and then are applied for various tasks, such as speech to text, speech gneration, speech translation, text summarization; dialogue and intent detection; grammatical error correction; text style transfer; semantic parsing and etc.
A survey on non-autoregressive neural machine translation including a brief review of other various tasks can be found on here].
**** Updates ****
- 2024/7/22: Add the category Simultaneous Translation, and add several papers.
- 2024/5/28: Add the category NAR with Large Language Models, and add more than 50 papers.
- 2023/10/15: Add about 20 papers.
- 2023/7/25: Add four papers.
- 2023/6/26: Add two papers.
- 2023/6/7: Add more than ten papers.
- 2023/4/3: Reorganize the category and add two papers.
NAR with Large Language Models
- [24ArXiv] A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models. [Paper]
- [24NAACL] DynaMo: Accelerating Language Model Inference with Dynamic Multi-Token Sampling. [Paper]
- [24ArXiv] Ouroboros: Speculative Decoding with Large Model Enhanced Drafting. [Paper]
- [24ICLR] Are BERT family good instruction followers? a study on their potential and limitations. [Paper]
NAR with Pre-trained Language Models
- [24ArXiV] UT5: Pretraining Non autoregressive T5 with unrolled denoising. [Paper]
- [23ArXiv] Directed Acyclic Transformer Pre-training for High-quality Non-autoregressive Text Generation. [Paper]
- [23ArXiv] Text Generation with Diffusion Language Models: A Pre-training Approach with Continuous Paragraph Denoise. [Paper] [Code]
- [22EMNLP] ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and Effective Text Generation. [Paper] [Code]
- [22ArXiv] A Self-Paced Mixed Distillation Method for Non-Autoregressive Generation. [Paper]
- [21ICML] BANG: Bridging Autoregressive and Non-autoregressive Generation with Large Scale Pretraining. [Paper] [Code]
- [22ArXiv] EdiT5: Semi-Autoregressive Text-Editing with T5 Warm-Start. [Paper]
- [22EMNLP] XLM-D: Decorate Cross-lingual Pre-training Model as Non-Autoregressive Neural Machine Translation. [Paper]
- [22EMNLP] JANUS: Joint Autoregressive and Non-autoregressive Training with Auxiliary Loss for Sequence Generation. [Paper] [Code]
- [22ACL] Universal conditional masked language pre-training for neural machine translation. [Paper] [Code]
- [21ArXiv] Improving Non-autoregressive Generation with Mixup Training. [Paper] [Code]
- [21EACL] Non-Autoregressive Text Generation with Pre-trained Language Models. [Paper] [Code]
- [21EACL] Non-autoregressive text generation with pre-trained language models. [Paper] [Code]
- [20NeurIPS] Incorporating bert into parallel sequence decoding with adapters. [Paper] [Code]
Neural machine translation
Tutorial
- ACL 2022 Non-Autoregressive Sequence Generation]
- EMNLP 2023 Non-Autoregressive Models for Fast Sequence Generation]
Papers
Knowledge distillation
- [23AAAI] Selective Knowledge Distillation for Non-Autoregressive Neural Machine Translation. [Paper]
- [22NAACL] One Reference Is Not Enough: Diverse Distillation with Reference Selection for Non-Autoregressive Translation. [Paper] [Code]
- [22ArXiv] DiMS: Distilling Multiple Steps of Iterative Non-Autoregressive Transformers. [Paper] [Code]
- [22NAACL] Neighbors Are Not Strangers: Improving Non-Autoregressive Translation under Low-Frequency Lexical Constraints. [Paper]
- [22ArXiv] Self-Distillation Mixup Training for Non-autoregressive Neural Machine Translation. [Paper]
- [22ACL-IJCNLP Findings] How Does Distilled Data Complexity Impact the Quality and Confidence of Non-Autoregressive Machine Translation? [Paper]
- [21ACL-IJCNLP] Rejuvenating Low-Frequency Words: Making the Most of Parallel Data in Non-Autoregressive Translation. [Paper] [Code]
- [21ICLR] UNDERSTANDING AND IMPROVING LEXICAL CHOICE IN NON-AUTOREGRESSIVE TRANSLATION. [Paper]
- [20ICLR] UNDERSTANDING KNOWLEDGE DISTILLATION IN NON-AUTOREGRESSIVE MACHINE TRANSLATION. [Paper] &
[Code]
- [20ACL] A Study of Non-autoregressive Model for Sequence Generation. [Paper]
- [20ACL] Improving Non-autoregressive Neural Machine Translation with Monolingual Data. [Paper]
Data learning strategy
- [22EMNLP Findings] Con-NAT: Contrastive Non-autoregressive Neural Machine Translation. [Paper]
- [22NAACL] Non-Autoregressive Neural Machine Translation with Consistency Regularization Optimized Variational Framework. [Paper] [Code]
- [22ACL] latent-GLAT: Glancing at Latent Variables for Parallel Text Generation. [Paper] [Code]
- [21ACL-IJCNLP] Glancing Transformer for Non-Autoregressive Neural Machine Translation. [Paper] [Code]
- [21ACL-IJCNLP Findinds] Progressive Multi-Granularity Training for Non-Autoregressive Translation. [Paper]
- [21ArXiv] MvSR-NAT: Multi-view Subset Regularization for Non-Autoregressive Machine Translation. [Paper]
- [23ArXiv] Optimizing Non-Autoregressive Transformers with Contrastive Learning. [Paper]
Iteration-based methods
- [24ArXic] Improving Non-autoregressive Machine Translation with Error Exposure and Consistency Regularization. [Paper]
- [24ArXiv] Analysis of Levenshtein Transformer's Decoder and Its Variants. [Paper]
- [24ArXiv] Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation. [Paper]
- [23AAAI] AMOM: Adaptive Masking over Masking for Conditional Masked Language Model. [Paper] [Code]
- [22NeurIPS] INSNET: An Efficient, Flexible, and Performant Insertion-based Text Generation Model. [Paper]
- [22ArXiv] Non-Autoregressive Machine Translation with Translation Memories. [Paper]
- [22ArXiv] Nearest Neighbor Non-autoregressive Text Generation. [Paper]
- [UnderReview] DEEP EQUILIBRIUM NON-AUTOREGRESSIVE SEQUENCE LEARNING. [Paper]
- [22ICLR] IMPROVING NON-AUTOREGRESSIVE TRANSLATION MODELS WITHOUT DISTILLATION. [Paper] [Code]
- [22ICLR] STEP-UNROLLED DENOISING AUTOENCODERS FOR TEXT GENERATION. [Paper] [Code]
- [21EMNLP] Learning to Rewrite for Non-Autoregressive Neural Machine Translation. [Paper] [Code]
- [20ArXiv] Semi-autoregressive training improves mask-predict decoding. [Paper]
- [20ICML] Non-autoregressive Machine Translation with Disentangled Context Transformer. [Paper] [Code]
- [20EMNLP] Semi-Autoregressive Neural Machine Translation. [Paper]
- [20ACL] ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation. [Paper] [Code]
- [20ACL] Jointly Masked Sequence-to-Sequence Model for Non-Autoregressive Neural Machine Translation. [Paper] [Code]
- [20ICLR] DEEP ENCODER, SHALLOW DECODER:REEVALUATING NON-AUTOREGRESSIVE MACHINE TRANSLATION. [Paper] [Code]
- [19ICML] Insertion Transformer: Flexible Sequence Generation via Insertion Operations. [Paper] [Code]
- [19NeurIPS] Levenshtein Transformer. [Paper] [Code]
- [19EMNLP-IJCNLP] Mask-Predict: Parallel Decoding of Conditional Masked Language Models. [Paper] [Code]
- [18EMNLP] Deterministic Non-Autoregressive Neural Sequence Modeling by Iterative Refinement. [Paper] [Code]
Latent variable-based methods
- [23ArXiv] Improving Non-autoregressive Translation Quality with Pretrained Language Model, Embedding Distillation and Upsampling Strategy for CTC. [Paper].
- [23ArXiv] Shared Latent Space by Both Languages in Non-Autoregressive Neural Machine Translation. [Paper]
- [23AAAI] RenewNAT: Renewing Potential Translation for Non-Autoregressive Transformer. [Paper]&[Code]
- [22EMNLP] Assessing Non-autoregressive Alignment in Neural Machine Translation via Word Reordering. [Paper]
- [21EMNLP] AligNART: Non-autoregressive Neural Machine Translation by Jointly Learning to Estimate Alignment and Translate. [Paper]
- [21EACL] Enriching Non-Autoregressive Transformer with Syntactic and Semantic Structures for Neural Machine Translation. [Paper]
- [21AAAI] Guiding Non-Autoregressive Neural Machine Translation Decoding with Reordering Information. [Paper] [Code]
- [21NAACL-HLT] Non-Autoregressive Translation by Learning Target Categorical Codes. [Paper] [Code]
- [21ACL-IJCNLP Findinds] Fully Non-autoregressive Neural Machine Translation:Tricks of the Trade. [Paper] [Code]
- [20AAAI] Latent-Variable Non-Autoregressive Neural Machine Translation with Deterministic Inference using a Delta Posterior. [Paper] [Code]
- [20EMNLP] Non-Autoregressive Machine Translation with Latent Alignments. [Paper]
- [20ArXiv] Incorporating a Local Translation Mechanism into Non-autoregressive Translation. [Paper] [Code]
- [19EMNLP-IJCNLP] FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow. [Paper] [Code]
- [19NeurIPS] Fast Structured Decoding for Sequence Models. [Paper]
- [19ArXiv] Non-autoregressive Transformer by Position Learning. [Paper]
- [19ACL] Syntactically Supervised Transformers for Faster Neural Machine Translation. [Paper] [Code]
- [18ICLR] NON-AUTOREGRESSIVE NEURAL MACHINE TRANSLATION. [Paper] [Code]
Enhancements-based mothods
- [23NeurIPS] Non-autoregressive Machine Translation with Probabilistic Context-free Grammar. [Paper]
- [23ICLR] FUZZY ALIGNMENTS IN DIRECTED ACYCLIC GRAPH FOR NON-AUTOREGRESSIVE MACHINE TRANSLATION. [Paper]
- [22EMNLP] Candidate Soups: Fusing Candidate Results Improves Translation Quality for Non-Autoregressive Translation. [Paper]
- [22EMNLP findings] Viterbi Decoding of Directed Acyclic Transformer for Non-Autoregressive Machine Translation. [Paper]
- [22ICML] Directed Acyclic Transformer for Non-Autoregressive Machine Translation. [Paper]
- [22ArXiv] Non-autoregressive Translation with Dependency-Aware Decoder. [Paper] [Code]
- [21ArXiv] LAVA NAT: A Non-Autoregressive Translation Model with Look-Around Decoding and Vocabulary Attention. [Paper]
- [21AAAI] Non-Autoregressive Translation with Layer-Wise Prediction and Deep Supervision. [Paper] [Code]
- [20COLING] Context-Aware Cross-Attention for Non-Autoregressive Translation. [Paper]
- [19AAAI] Non-Autoregressive Neural Machine Translation with Enhanced Decoder Input. [Paper]
- [19AAAI] Non-Autoregressive Machine Translation with Auxiliary Regularization. [Paper]
Criterion
- [22ICML] On the Learning of Non-Autoregressive Transformers. [Paper]
- [22EMNLP] Multi-Granularity Optimization for Non-Autoregressive Translation. [Paper]
- [22COLNG] ngram-OAXE: Phrase-Based Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation. [Paper] [Code]
- [22NeurIPS] Non-Monotonic Latent Alignments for CTC-Based Non-Autoregressive Machine Translation. [Paper]
[Code]
- [22NAACL] A Study of Syntactic Multi-Modality in Non-Autoregressive Machine Translation. [Paper]
- [21ICML] Order-Agnostic Cross Entropy for Non-Autoregressive Machine Translation. [Paper] [Code]
- [20ICML] Aligned Cross Entropy for Non-Autoregressive Machine Translation. [Paper] [Code]
(https://github.com/ictnlp/RSI-NAT)]
- [20AAAI] Minimizing the Bag-of-Ngrams Difference for Non-Autoregressive Neural Machine Translation. [Paper] [Code]
- [18EMNLP] End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification. [Paper]
- [19ACL] Retrieving Sequential Information for Non-Autoregressive Neural Machine Translation. [Paper]
- [06ICML] Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks. [Paper] [Code]
Decoding
- [22EMNLP] Candidate Soups: Fusing Candidate Results Improves Translation Quality for Non-Autoregressive Translation. [Paper] [Code]
- [UnderReview] HYBRID-REGRESSIVE NEURAL MACHINE TRANSLATION. [Paper]
- [22EAMT] Diformer: Directional Transformer for Neural Machine Translation. [Paper]
- [22ArXiv] Lossless Speedup of Autoregressive Translation with Generalized Aggressive Decoding. [Paper] [Code]
- [20ACL] Learning to Recover from Multi-Modality Errors for Non-Autoregressive Neural Machine Translation. [Paper] [Code]
- [20COLING] Train Once, and Decode As You Like. [Paper]
- [18EMNLP] Semi-Autoregressive Neural Machine Translation. [Paper]
Benefiting from AR Pre-trained Modoels
- [22EMNLP] Helping the Weak Makes You Strong: Simple Multi-Task Learning Improves Non-Autoregressive Translators. [Paper] [Code]
- [21NAACL-HLT] Multi-Task Learning with Shared Encoder for Non-Autoregressive Machine Translation. [Paper] [Code]
- [21IJCAI] Task-Level Curriculum Learning for Non-Autoregressive Neural Machine Translation. [Paper]
- [20AutoSimtrans] Improving Autoregressive NMT with Non-Autoregressive Model. [Paper]
- [20ACL] ENGINE: Energy-Based Inference Networks for Non-Autoregressive Machine Translation. [Paper] [Code]
- [20ICML] An EM Approach to Non-autoregressive Conditional Sequence Generation. [Paper] [Code]
- [20AAAI] Fine-Tuning by Curriculum Learning for Non-Autoregressive Neural Machine Translation. [Paper] [Code]
- [19ACL] Imitation Learning for Non-Autoregressive Neural Machine Translation. [Paper]
- [19EMNLP-IJCNLP] Hint-Based Training for Non-Autoregressive Machine Translation. [Paper] [Code]
Evaluations and explorations
- [22ICLR] NON-AUTOREGRESSIVE MODELS ARE BETTER MULTILINGUAL TRANSLATORS. [Paper]
- [21NeurIPS] Duplex Sequence-to-Sequence Learning for Reversible Machine Translation. [Paper] [Code]
- [22NAACL] Non-Autoregressive Machine Translation: It’s Not as Fast as it Seems. [Paper]
- [22ArXiv] Non-Autoregressive Neural Machine Translation: A Call for Clarity. [Paper]
- [UnderReview] ATTENTIVE MLP FOR NON-AUTOREGRESSIVE GENERATION. [Paper]
- [Findings of ACL23] Revisiting Non-Autoregressive Translation at Scale. [Paper]
- [23ArXiv] Non-Autoregressive Document-Level Machine Translation (NA-DMT)]: Exploring Effective Approaches, Challenges, and Opportunities. [Paper]
- [Finding of ACL2024] What Have We Achieved on Non-autoregressive Translation? [Paper]
- [24ArXiv] On the Information Redundancy in Non-Autoregressive Translation. [Paper]
Speech related (Text to speech, speech translation, automatic speech recognition)]
Automatic speech recognition(ASR)]
- [22ICASSP] Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models. [Paper]
- [22ICASSP] Non-Autoregressive ASR with Self-Conditioned Folded Encoders. [Paper]
- [21ICASSP] CASS-NAT: CTC Alignment-Based Single Step Non-Autoregressive Transformer for Speech Recognition. [Paper]
- [21ICASSP] Improved Mask-CTC for Non-Autoregressive End-to-End ASR. [Paper]
- [21ICASSP] Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input. [Paper]
- [21NAACL] Align-Refine: Non-Autoregressive Speech Recognition via Iterative Realignment. [Paper]
- [21arvix] Fast End-to-End Speech Recognition via a Non-Autoregressive Model and Cross-Modal Knowledge Transferring from BERT. [Paper]
- [21ASRU][ Non-autoregressive Mandarin-English Code-switching Speech Recognition with Pinyin Mask-CTC and Word Embedding Regularization. [Paper]
- [21ArXiv] Pushing the Limits of Non-Autoregressive Speech Recognition. [Paper]
- [21ArXiv] WNARS: WFST based Non-autoregressive Streaming End-to-End Speech Recognition. [Paper]
- [21Interspeech] An Improved Single Step Non-autoregressive Transformer for Automatic Speech Recognition. [Paper]
- [21ArXiv] Non-autoregressive Transformer with Unified Bidirectional Decoder for Automatic Speech Recognition. [Paper]
- [21ASRU] A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation. [Paper]
- [21ArXiv] Boundary and Context Aware Training for CIF-based Non-Autoregressive End-to-end ASR. [Paper]
- [21ArXiv] Non-autoregressive Transformer-based End-to-end ASR using BERT. [Paper]
- [21Interspeech ] Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain. [Paper] [Code]
- [21Interspeech] Streaming End-to-End ASR based on Blockwise Non-Autoregressive Models. [Paper] [Code]
- [21ArXiv] Listen and Fill in the Missing Letters: Non-Autoregressive Transformer for Speech Recognition. [Paper]
- [21INTERSPEECH] Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict. [Paper] [Code]
- [21INTERSPEECH] Insertion-Based Modeling for End-to-End Automatic Speech Recognition. [Paper] [Code]
- [21INTERSPEECH] Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition. [Paper]
- [21ArXiv] Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition. [Paper]
- [21IAASP] INTERMEDIATE LOSS REGULARIZATION FOR CTC-BASED SPEECH RECOGNITION. [Paper] [Code]
- [21INTERSPEECH] Align-Denoise: Single-Pass Non-Autoregressive Speech Recognition. [Paper] [Code]
- [21ICCASP] Intermediate loss regularization for ctc-based speech recognition. [Paper] [Code]
- [21ArXiv] Relaxing the conditional independence assumption of CTC-based ASR by conditioning on intermediate predictions. [Paper] [Code]
- [22ArXiv] Improving CTC-based ASR Models with Gated Interlayer Collaboration. [Paper]
- [22ArXiv] PATCORRECT: NON-AUTOREGRESSIVE PHONEMEAUGMENTED TRANSFORMER FOR ASR ERROR CORRECTION. [Paper]
- [22SLT] A context-aware knowledge transferring strategy for CTC-based ASR. [Paper]
- [22Interspeech] Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM. [Paper]
- [22SLT] Inter-KD: Intermediate Knowledge Distillation for CTC-Based Automatic Speech Recognition. [Paper]
- [22ArXiv] Personalization of CTC Speech Recognition Models. [Paper]
- [22Interspeech] Knowledge Transfer and Distillation from Autoregressive to Non-Autoregressive Speech Recognition. [Paper]
- [22ArXiv] BECTRA: Transducer-based End-to-End ASR with BERT-Enhanced Encoder. [Paper]
- [22SLT] Towards Personalization of CTC Speech Recognition Models with Contextual Adapters and Adaptive Boosting. [Paper]
- [22InterSpeech] Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition. [Paper]
- [23InterSpeech] A Lexical-aware Non-autoregressive Transformer-based ASR Model. [Paper]
- [23ArXiv] Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System. [Paper]
- [TASLP] A CTC Alignment-based Non-autoregressive Transformer for End-to-end Automatic Speech Recognition. [Paper]
- [23ASRU] Zero-Shot Emotion Transfer For Cross-Lingual Speech Synthesis. [Paper]
- [23ArXiv] Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors. [Paper]
- [23ICASSP] AAS-VC: On the Generalization Ability of Automatic Alignment Search based Non-autoregressive Sequence-to-sequence Voice Conversion. [Paper]
- [24ArXiv] DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation. [Paper] [Code]
- [24ArXiv] Non-autoregressive real-time Accent Conversion model with voice cloning. [Paper]
- [24IJCAI] FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation. [Paper]
- [24ArXiv] UniEnc-CASSNAT: An Encoder-only Non-autoregressive ASR for Speech SSL Models. [Paper]
Text to speech (TTS)]
- [22INTERSPEECH] Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech. [Paper]
- [22ArXiv] vTTS: visual-text to speech. [Paper]
- [22Interspeech] A Multi-Scale Time-Frequency Spectrogram Discriminator for GAN-based Non-Autoregressive TTS. [Paper]
- [22ACL] Revisiting Over-Smoothness in Text to Speech. [Paper]
- [21ICLR] Bidirectional Variational Inference for Non-Autoregressive Text-to-Speech. [Paper] [Code]
- [21ArXiv] VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention. [Paper]
- [21ArXiv] Nana-HDR: A Non-attentive Non-autoregressive Hybrid Model for TTS. [Paper]
- [21Speech Synthesis Workshop] Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech. [Paper]
- [21ArXiv] VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis. [Paper] [Code]
- [21ArXiv] Exploring Timbre Disentanglement in Non-Autoregressive Cross-Lingual Text-to-Speech. [Paper]
- [21ICML] Non-Autoregressive Neural Text-to-Speech [Paper] [Code]
- [21Interspeech] Quasi-Periodic Parallel WaveGAN Vocoder: A Non-autoregressive Pitch-dependent Dilated Convolution Model for Parametric Speech Generation. [Paper] [Code]
- [21NeurIPS] FastSpeech: Fast, Robust and Controllable Text to Speech. [Paper] [Code]
- [21ArXiv] TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction. [Paper] [Code])]
- [21ArXiv] Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis. [Paper] [Code]
- [22ArXiv] Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech. [Paper]
- [22ACL] Cross-Utterance Conditioned VAE for Non-Autoregressive Text-to-Speech. [Paper]
- [22ICLR] BAG OF TRICKS FOR UNSUPERVISED TTS. [Paper].
- [23ICASSP] Spoofed training data for speech spoofing countermeasure can be efficiently created using neural vocoders. [Paper]
- [23IALP] MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline. [Paper]
- [23InterSpeech] Towards Robust FastSpeech 2 by Modelling Residual Multimodality. [Paper]
- [ISCA] An analysis on the effects of speaker embedding choice in non auto-regressive TTS. [Paper]
- [ArXiv] VampNet: Music Generation via Masked Acoustic Token Modeling. [Paper] [Code]
- [23ASRU] On the Relevance of Phoneme Duration Variability of Synthesized Training Data for Automatic Speech Recognition. [Paper]
- [23ICASSP] Matcha-TTS: A fast TTS architecture with conditional flow matching. [Paper]
- [24ArXiv] An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis. [Paper]
- [24ArXiv] Speech-driven Personalized Gesture Synthetics: Harnessing Automatic Fuzzy Feature Inference. [Paper]
- [24ArXiv] SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation. [Paper]
- [24ICCASP] Multilingual and Fully Non-Autoregressive ASR with Large Language Model Fusion: A Comprehensive Study. [Paper]
- [24ArXiv] Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction. [Paper]
Speech translation
- [21ACL findings] Investigating the Reordering Capability in CTC-based Non-Autoregressive End-to-End Speech Translation. [Paper] [Code]
- [21ICASSP] ORTHROS: non-autoregressive end-to-end speech translation With dual-decoder. [Paper])]
- [21ArXiv] Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring. [Paper]
- [21ASRU] Fast-MD: Fast Multi-Decoder End-to-End Speech Translation with Non-Autoregressive Hidden Intermediates. [Paper]
- [22ArXiv] A Novel Chinese Dialect TTS Frontend with Non-Autoregressive Neural Machine Translation. [Paper]
- [22ArXiv] TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation. [Paper]
- [23ACL] CTC-based Non-autoregressive Speech Translation. [Paper] [Code]
- [23NeurIPS] DASpeech:DirectedAcyclicTransformerfor FastandHigh-qualitySpeech-to-SpeechTranslation. [Paper] [Code]
- [24ArXiv] OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification. [Paper]
Others
- [22ArXiv] Conditional Deep Hierarchical Variational Autoencoder for Voice Conversion. [Paper]
- [22ICLR] Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks. [Paper]
- [21ICASSP] Non-Autoregressive Sequence-To-Sequence Voice Conversion. [Paper]
- [21ArXiv] Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies. [Paper] [Code]
- [21ArXiv] Exploring Non-Autoregressive End-To-End Neural Modeling For English Mispronunciation Detection And Diagnosis. [Paper]
- [22ArXiv] FastLTS: Non-Autoregressive End-to-End Unconstrained Lip-to-Speech Synthesis. [Paper]
- [22ArXiv] Streaming non-autoregressive model for any-to-many voice conversion. [Paper]
- [UnderReview] REPHRASETTS: DYNAMIC LENGTH TEXT BASED SPEECH INSERTION WITH SPEAKER STYLE TRANSFER. [Paper]
- [24ArXiv] Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos. [Paper]
- [24COLING] Select and Reorder: A Novel Approach for Neural Sign Language Production. [Paper]
- [24CVPR] Towards Variable and Coordinated Holistic Co-Speech Motion Generation. [Paper]
- [24ArXiv] Masked Audio Generation using a Single Non-Autoregressive Transformer. [Paper]
- [24ArXiv] Anfinsen Goes Neural: a Graphical Model for Conditional Antibody Design. [Paper]
- [24AAAI] Spot the Error: Non-autoregressive Graphic Layout Generation with Wireframe Locator. [Paper]
- [24ArXiv] Distilling Autoregressive Models to Obtain High-Performance Non-Autoregressive Solvers for Vehicle Routing Problems with Faster Inference Speed. [Paper]
- [24ICCASP] StemGen: A music generation model that listens. [Paper]
- [24ICCASP] SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention. [Paper]
- [24NeurIPS workshop] Fast non-autoregressive inverse folding with discrete diffusion. [Paper]
Other tasks (Text Summarization; Dialogue and Intent Detection; Grammatical Error Correction; Text Style Transfer; Parsing; etc.)]
Simultaneous Translation
- [23EMNLP] Non-autoregressive Streaming Transformer for Simultaneous Translation [[Paper]] [[Code]]
- [24ACL] A Non-autoregressive Generation Framework for End-to-End Simultaneous Speech-to-Any Translation [[Paper]][[Code]]
Summarization
- [22ACL] Learning Non-Autoregressive Models from Search for Unsupervised Sentence Summarization. [Paper]
- [22Arxiv] A Character-Level Length-Control Algorithm for Non-Autoregressive Sentence Summarization. [Paper]
- [22ArXiv] An Imitation Learning Curriculum for Text Editing with Non-Autoregressive Models. [Paper]
- [21ACL-IJCNLP] POS-Constrained Parallel Decoding for Non-autoregressive Generation. [Paper] [Code]
- [21ArXiv] Integrated Training for Sequence-to-Sequence Models Using Non-Autoregressive Transformer. [Paper]
Dialogue
- [21EMNLP] Thinking Clearly, Talking Fast: Concept-Guided Non-Autoregressive Generation for Open-Domain Dialogue Systems. [Paper] [Code]
- [20ArXiv] Non-Autoregressive Neural Dialogue Generation. [Paper]
- [21ACL-IJCNLP] GL-GIN: Fast and Accurate Non-Autoregressive Model for Joint Multiple Intent Detection and Slot Filling. [Paper] [Code]
- [21ArXiv] An Effective Non-Autoregressive Model for Spoken Language Understanding. [Paper]
- [20EMNLP] SlotRefine: A Fast Non-Autoregressive Model for Joint Intent Detection and Slot Filling. [Paper] [Code]
- [20ICLR] Non-Autoregressive Dialog State Tracking. [Paper] [Code]
Parsing
- [21EMNLP Findings] Span Pointer Networks for Non-Autoregressive Task-Oriented Semantic Parsing. [Paper]
- [21NAACL-HLT] Non-Autoregressive Semantic Parsing for Compositional Task-Oriented Dialog. [Paper] [Code]
- [20ArXiv] Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement. [Paper] [Code]
- [23ACL Findings] A Semi-Autoregressive Graph Generative Model for Dependency Graph Parsing. [Paper]
Grammatical Error Correction
- [21ACL-IJCNLP] Tail-to-Tail Non-Autoregressive Sequence Prediction for Chinese Grammatical Error Correction. [Paper] [Code]
- [21WNUT] Character Transformations for Non-Autoregressive GEC Tagging. [Paper] [Code]
- [22EMNLP] Mask the Correct Tokens: An Embarrassingly Simple Approach for Error Correction. [Paper]
- [23EMNLP] Non-autoregressiveTextEditingwithCopy-awareLatentAlignments. [Paper]
- [23ACL] GEC-DePenD: Non-Autoregressive Grammatical Error Correction with Decoupled Permutation and Decoding [Paper]
Text Style Transfer
- [21EMNLP] Exploring Non-Autoregressive Text Style Transfer. [Paper] [Code]
- [21ACL-IJCNLP Findings] NAST: A Non-Autoregressive Generator with Word Alignment for Unsupervised Text Style Transfer. [Paper] [Code]
Controllable Text Generation
- [21ACL-IJCNLP Findings] A Non-Autoregressive Edit-Based Approach to Controllable Text Simplification. [Paper]
- [22ArXiv] Gradient-Based Constrained Sampling from Language Models. [Paper] [Code]
- [22ArXiv] Diffusion-LM Improves Controllable Text Generation. [Paper] [Code]
- [22ArXiv] AutoTemplate: A Simple Recipe for Lexically Constrained Text Generation. [Paper]
- [23IJCAI] KEST: Kernel Distance Based Efficient Self-Training for Improving Controllable Text Generation. [Paper]
- [24NAACL] Control-DAG: Constrained Decoding for Non-Autoregressive Directed Acyclic T5 using Weighted Finite State Automata [Paper] [Code]
Question Answering
- [22ArXiv] NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual Question Answering. [Paper]
- [22ArXiv] KECP: Knowledge Enhanced Contrastive Prompting for Few-shot Extractive Question Answering. [Paper] [Code]
Image Caption
- [22ECCV] Explicit Image Caption Editing. [Paper] [Code]
- [22Multimedia ] Efficient Modeling of Future Context for Image Captioning. [Paper] [Code]
- [22NeurIPS] Learning Distinct and Representative Modes for Image Captioning. [Paper] [Code]
- [23AAAI] Uncertainty-Aware Image Captioning. [Paper]
Others
- [21ArXiv] EncT5: A Framework for Fine-tuning T5 as Non-autoregressive Models. [Paper]
- [21EMNLP] Maximal Clique Based Non-Autoregressive Open Information Extraction. [Paper]
- [20ArXiv] A Study on the Autoregressive and non-Autoregressive Multi-label Learning. [Paper]
- [22ArXiv] Capture Salient Historical Information: A Fast and Accurate Non-Autoregressive Model for Multi-turn Spoken Language Understanding. [Paper]
- [22ArXiv] Continuous conditional video synthesis by neural processes. [Paper]
- [22ArXiv] Multi-scale Attention Flow for Probabilistic Time Series Forecasting. [Paper]
- [22ArXiv] Non-autoregressive Model for Full-line Code Completion. [Paper]
- [22ArXiv] Acoustic-aware Non-autoregressive Spell Correction with Mask Sample Decoding. [Paper]
- [22EMNLP] Composing Ci with Reinforced Non-autoregressive Text Generation. [Paper]
- [23ArXiv] Non-Autoregressive Math Word Problem Solver with Unified Tree Structure. [Paper]
- [23ArXiv] RobustL2S: Speaker-Specific Lip-to-Speech Synthesis exploiting Self-Supervised Representations. [Paper]
- [23ArXiv] VampNet: Music Generation via Masked Acoustic Token Modeling. [Paper]
- [24KDD] Non-autoregressive Generative Models for Reranking Recommendation. [Paper]
- [24EMNLP] INarIG: Iterative Non-autoregressive Instruct Generation Model For Word-Level Auto Completion. [Paper]
Computer Vision
- [24ICCV] Translating Images to Road Network:A Non-Autoregressive Sequence-to-Sequence Approach. [Paper]
- [24CVPR] Non-autoregressive Sequence-to-Sequence Vision-Language Models. [Paper]
- [UnderReview] Semi-Autoregressive Energy Flows: Towards Determinant-Free Training of Normalizing Flows. [Paper]
- [UnderReview] LANGUAGE-GUIDED ARTISTIC STYLE TRANSFER USING THE LATENT SPACE OFDALL-E. [Paper]
- [23ICLR] CHIRODIFF: MODELLING CHIROGRAPHIC DATA WITH DIFFUSION MODELS. [Paper]
- [22ArXiv] STPOTR: Simultaneous Human Trajectory and Pose Prediction Using a Non-Autoregressive Transformer for Robot Following Ahead. [Paper]
- [22ECCV] Improved Masked Image Generation with Token-Critic. [Paper]
- [223DV] TEACH: Temporal Action Composition for 3D Humans. [Paper]
- [23ECCV] Non-Autoregressive Sign Language Production via Knowledge Distillation. [Paper]
- [22ArXiv] Megapixel Image Generation with Step-Unrolled Denoising Autoencoders. [Paper]
- [22ArXiv] M6-Fashion: High-Fidelity Multi-modal Image Generation and Editing. [Paper]
- [23MM] Speech-Driven 3D Face Animation with Composite and Regional Facial Movements. [Paper]
- [24ArXiv] Emage: Non-Autoregressive Text-to-Image Generation. [Paper] [Code]
- [24CVPR] MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers. [Paper]
- [24ArXiv] Compression of end-to-end non-autoregressive image-to-speech system for low resourced devices [Paper]
Specially, we present recent progress of difussion models in different tasks, which also adpot non-autoregressive format in each difffusion step.
Difussion Models
- [24ICCASP] PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model. [Paper]
- [24AAAI] Text Diffusion with Reinforced Conditioning. [Paper]
- [24ACL] TEncDM: Understanding the Properties of Diffusion Model in the Space of Language Model Encodings. [Paper]
- [23ICLR] CHIRODIFF: MODELLING CHIROGRAPHIC DATA WITH DIFFUSION MODELS. [Paper]
- [22ArXiv] Photorealistic text-to-image diffusion models with deep language understanding. [Paper]
- [22ArXiv] Hierarchical Text-Conditional Image Generation with CLIP Latents. [Paper]
- [22ICML] Glide: Towards photorealistic image generation and editing with text-guided diffusion models. [Paper]
- [22ArXiv] Classifier-free diffusion guidance. [Paper]
- [22ICML] Latent diffusion energy based model for interpretable text modeling. [Paper]
- [22ArXiv] Diffusion-lm improves controllable text generation. [Paper] [Code]
- [23ICLR] DIFFUSER: DIFFUSION VIA EDIT-BASED RECONSTRUCTION. [Paper] [Code]
- [22ArXiv] DIFFUSEQ: SEQUENCE TO SEQUENCE TEXT GENERATION WITH DIFFUSION MODELS. [Paper] [Code]
- [22ArXiv] Understanding Diffusion Models: A Unified Perspective. [Paper]
- [22ArXiv] Diffusion Models: A Comprehensive Survey of Methods and Applications. [Paper]
- [22ARXiv] DiffGAR: Model-Agnostic Restoration from Generative Artifacts Using Image-to-Image Diffusion Models. [Paper]
- [22ARxiv] CLIP-Diffusion-LM: Apply Diffusion Model on Image Captioning. [Paper]
- [22ArXiv] WaveFit: An Iterative and Non-autoregressive Neural Vocoder based on Fixed-Point Iteration. [Paper]
- [22ArXiv] Diffsound: Discrete Diffusion Model for Text-to-sound Generation. [Paper]
- [21ICLR] WAVEGRAD: ESTIMATING GRADIENTS FOR WAVEFORM GENERATION. [Paper]
- [21NeuIPS] Structured Denoising Diffusion Models in Discrete State-Spaces. [Paper]
- [23ArXiv] Diffusion-NAT: Self-Prompting Discrete Diffusion for Non-Autoregressive Text Generation. [Paper]
Results
We show the performance in translation on several datesets here].