Closed niranjanaryan closed 5 years ago
Sorry, currently the data cannot be open, you may refer to the original paper and preprocess it yourself.
Train_mem.?
@Stallon-niranjan It is not relevant to this project. You can run the model without it.
From what I understood, 's' in si and sl means source, and 't' in ti, tl means target. Does source mean the list of keywords, and target mean original text paragraphs? Also, could you please specify what does each dimension of si, sl, ti, tl mean?
Any help would be much appreciated.
@KevinHuuu Sorry for writing such a confusing code, I'm gonna optimize the readability of the code in the near future. Your understanding is right, below is a brief summarization:
variable | meaning | shape |
---|---|---|
si | source index, refering to the input topic | [batch_size, topic_num], the original paper uses topic_num=5 |
sl | source length, number of topic words | [batch_size, 1], used for dynamic number of topic words |
ti | target index, word indexes of corresponding essays | [batch_size, max_len] |
tl | target length, length of target essays | [batch_size, 1] |
I think you should write a code that teach us to process the data, about the zhihu and the other data, the source code have, and i don't undersatnd how to change the text file to npy file. Thank you
can you share the method to do preprocessing and converting to .npy , how to load dataset and vector file. si, sl, ti, tl, train_mem = load_npy(Config().train_data_path) si ? sl ? ti ? tl ?
niruniru21@gmail.com