Data Loading and Preprocessing, Is it through .npy

TobiasLee / MTA-LSTM-TensorFlow

TensorFlow reimplementation of Topic-to-Essay Generation with Neural Networks.

https://tobiaslee.top/2018/11/02/customized-RNN-cell/

70 stars 24 forks source link

Data Loading and Preprocessing, Is it through .npy #1

Closed niranjanaryan closed 5 years ago

niranjanaryan commented 5 years ago

can you share the method to do preprocessing and converting to .npy , how to load dataset and vector file. si, sl, ti, tl, train_mem = load_npy(Config().train_data_path) si ? sl ? ti ? tl ?

niruniru21@gmail.com

TobiasLee commented 5 years ago

Sorry, currently the data cannot be open, you may refer to the original paper and preprocess it yourself.

niranjanaryan commented 5 years ago

Train_mem.?

TobiasLee commented 5 years ago

@Stallon-niranjan It is not relevant to this project. You can run the model without it.

KevinHuuu commented 5 years ago

From what I understood, 's' in si and sl means source, and 't' in ti, tl means target. Does source mean the list of keywords, and target mean original text paragraphs? Also, could you please specify what does each dimension of si, sl, ti, tl mean?

Any help would be much appreciated.

TobiasLee commented 5 years ago

@KevinHuuu Sorry for writing such a confusing code, I'm gonna optimize the readability of the code in the near future. Your understanding is right, below is a brief summarization:

variable	meaning	shape
si	source index, refering to the input topic	[batch_size, topic_num], the original paper uses topic_num=5
sl	source length, number of topic words	[batch_size, 1], used for dynamic number of topic words
ti	target index, word indexes of corresponding essays	[batch_size, max_len]
tl	target length, length of target essays	[batch_size, 1]

Haojin-Hu commented 4 years ago

I think you should write a code that teach us to process the data, about the zhihu and the other data, the source code have, and i don't undersatnd how to change the text file to npy file. Thank you