EternalFeather / Transformer-in-generating-dialogue

An Implementation of 'Attention is all you need' with Chinese Corpus
Apache License 2.0
129 stars 40 forks source link
attention-mechanism chatbot chinese-corpus tensor2tensor transformer

An Implementation of Attention is all you need with Chinese Corpus

  The code is an implementation of Paper Attention is all you need working for dialogue generation tasks like: ChatbotText Generation and so on.
  Thanks to every friends who have raised issues and helped solve them. Your contribution is very important for the improvement of this project. Due to the limited support of the 'static graph mode' in coding, we decided to move the features to 2.0.0-beta1 version. However if you worry about the problems from docker building and service creation with version issues, we still keep an old version of the code written by eager mode using tensorflow 1.12.x version to refer.

Documents

|-- root/
    |-- data/
        |-- src-train.csv
        |-- src-val.csv
        |-- tgt-train.csv
        `-- tgt-val.csv
    |-- old_version/
        |-- data_loader.py
        |-- eval.py
        |-- make_dic.py
        |-- modules.py
        |-- params.py
        |-- requirements.txt
        `-- train.py
    |-- tf1.12.0-eager/
        |-- bleu.py
        |-- main.ipynb
        |-- modules.py
        |-- params.py
        |-- requirements.txt
        `-- utils.py
    |-- images/
    |-- bleu.py
    |-- main-v2.ipynb
    |-- modules-v2.py
    |-- params.py
    |-- requirements.txt
    `-- utils-v2.py

Requirements

Construction

  As we all know the Translation System can be used in implementing conversational model just by replacing the paris of two different sentences to questions and answers. After all, the basic conversation model named "Sequence-to-Sequence" is develped from translation system. Therefore, why we not to improve the efficiency of conversation model in generating dialogues?

  With the development of BERT-based models, more and more nlp tasks are refreshed constantly. However, the language model is not contained in BERT's open source tasks. There is no doubt that on this way we still have a long way to go.

Model Advantages

  A transformer model handles variable-sized input using stacks of self-attention layers instead of RNNs or CNNs. This general architecture has a number of advantages and special ticks. Now let's take them out:

Implementation details

  In the newest version of our code, we complete the details described in paper.

Data Generation

Positional Encoding

Mask

Scaled dot product attention

Multi-head attention

Pointwise Feedforward Network

Learning Rate Schedule

Model Downsides

However, such a strong architecture still have some downsides:

Usage

Results

Comparison

Implement feedforward through fully connected.

Implement feedforward through convolution in only one dimention.

Tips

  If you try to use AutoGraph to speed up your training process, please make sure the datasets is padded to a fixed length. Because of the graph rebuilding operation will be activated during training, which may affect the performance. Our code only ensures the performance of version 2.0, and the lower ones can try to refer it.

Reference

Thanks for Transformer and Tensorflow