THUNLP-AIPoet / MixPoet

Source codes of MixPoet: Diverse Poetry Generation via Learning Controllable Mixed Latent Space (AAAI 2020)
146 stars 30 forks source link
chinese poetry-generator style

MixPoet

The source code of MixPoet: Diverse Poetry Generation via Learning Controllable Mixed Latent Space (AAAI 2020).

1. Rights

All rights reserved.

2. Requirements

3. Data Preparation

To train the model and generate poems, please

We provide a small sampled subset of the labelled CQCF in the preprocess directory to test these codes. We will consider releasing the full CQCF set in the future. You can also use your own data.

4. Preprocessing

In Mixpoet/preprocess/, just run:

python preprocess.py --n 150000

The parameter n indicates the number of unlabelled instances to be used for semi-supervised training. When one uses the provided sampled CQCF set, we recommend n is about 150,000. With a larger labelled dataset, one can set a larger n.

After running preprocess.py, please move the generated vocab.pickle, ivocab.pickle, semi_train.pickle and semi_valid.pickle into MixPoet/corpus/, and move test_inps.txt and training_lines.txt into MixPoet/data/.

5. Training

In MixPoet/codes/, run:

python train.py

The encoder and decoder will be pre-trained as a denoising autoencoder, and the classifiers will be pre-trained with the labelled poems. Then the MixPoet model is trained based on these pre-trained modules.

One can also edit MixPoet/codes/config.py to modify the configuration, such as the hidden size, embedding size, data path, training epoch, learning rate and so on.

During the training process, some training information is outputed, such as:

The training and validation information is saved in MixPoet/log/.

6. Generation

To generate a poem in an interactive interface, in MixPoet/codes/, run:

python generate.py -v 1

Then one can input the keyword, length and factor labels, and then get the generated pome:

By running:

python generate.py -v 1 -s 1

one can manually select each generated line from the beam candidates.

To generate poems with an input testing file, which contains a set of keywords, run:

python generate.py -m file -l 5 -i ../data/test_inps.txt -o outs_5char.txt

where l = 5 or 7, indicating the 5-char-line or 7-char-line quatrains.

7. Cite

If you use our source code, please kindly cite this paper:

Xiaoyuan Yi, Ruoyu Li, Cheng Yang, Wenhao Li and Maosong Sun. 2020. MixPoet: Diverse Poetry Generation via Learning Controllable Mixed Latent Space. In Proceedings of The Thirty-Fourth AAAI Conference on Artificial Intelligence, New York, USA.

The bib format is as follows:

@inproceedings{Mixpoet:20,
    author  = {Xiaoyuan Yi and Ruoyu Li and Cheng Yang and Wenhao Li and Maosong Sun},
    title   = {MixPoet: Diverse Poetry Generation via Learning Controllable Mixed Latent Space},
    year    = "2020",
    booktitle = {Proceedings of The Thirty-Fourth AAAI Conference on Artificial Intelligence},
    address = {New York, USA}
}

8. System

This work is a part of the automatic Chinese poetry generation system, THUAIPoet (Jiuge, 九歌) developed by Research Center for Natural Language Processing, Computational Humanities and Social Sciences, Tsinghua University (清华大学人工智能研究院, 自然语言处理与社会人文计算研究中心). Please refer to THUNLP and THUNLP Lab for more information.

9. Contact

If you have any questions, suggestions or bug reports, please feel free to email yi-xy16@mails.tsinghua.edu.cn or mtmoonyi@gmail.com.