Question about vocab of dataprocessor.py

Dear Butsugiri,

Thank you for sharing your code. I just have a clarification about dataprocessor.vocab variable. After running the following lines:

data_processor = DataProcessor(args.data, args.vocab, args.test, args.max_length)
data_processor.prepare_dataset()
data_processor.compute_max_length()
train_data = data_processor.train_data
dev_data = data_processor.dev_data
test_data = data_processor.test_data

dataprocessor.vocab variable only has 2 entries and and hence this will be the input to the model creation.

cnn = ABCNN(n_vocab=len(vocab), embed_dim=embed_dim, input_channel=input_channel,
           output_channel=50, x1s_len=x1s_len, x2s_len=x2s_len, model_type=model_type, single_attention_mat=args.single_attention_mat)  # ABCNNはoutput = 50固定らしいが．
model = Classifier(cnn, lossfun=sigmoid_cross_entropy,
                     accfun=binary_accuracy)
if args.glove:
    cnn.load_glove_embeddings(args.glove_path, data_processor.vocab)
if args.word2vec:
    cnn.load_word2vec_embeddings(args.word2vec_path, data_processor.vocab)
if args.gpu >= 0:
    cuda.get_device(args.gpu).use()
    model.to_gpu()
cnn.set_pad_embedding_to_zero(data_processor.vocab)

Sorry, I haven't finished reading the whole code but I wonder at this point if that is the intention of that variable or it should have contained all the vocab in the dataset?

Cheers, Kurt

butsugiri / chainer-abcnn

Question about vocab of dataprocessor.py #3