HongyuGong / TextStyleTransfer

Reinforcement Learning Based Text Style Transfer without Parallel Training Corpus
26 stars 9 forks source link

Setup Questions #2

Open tigsss opened 4 years ago

tigsss commented 4 years ago

Hello Hongyu,

Thanks for your great work! We just got the GYAFC dataset and we were wondering which pretrained embeddings you used. In particular, we are a little stumped on this step:

Put train/dev/test corpus in original/target style as corpus.(train/dev/test).(orig/tsf) Put pretrained embedding in text format to the path of embed_fn

In our data/gyafc_family folder, we should have files with corpus.train.orig, corpus.train.tsf, etc, right (that are the formal / informal files from GYAFC but just renamed)?

What pretrained embeddings did you use for this? Were they from within the GYAFC dataset or did you have to train the embeddings somehow first?

Thanks again!

HongyuGong commented 4 years ago

Hi,

Thanks for your interest in our work.

  1. We rename the informal/formal files from GYAFC corpus as corpus..orig/corpus..tsf accordingly.

  2. As for the pretrained embeddings, we use word2vec embeddings trained on English Wikicorpus. As you may have noticed in tuneEmbed() from src/corpus_helper.py, the embeddings would be further tuned on the training data of GYAFC.