chop-dbhi / twitter-adr-blstm

A model for finding mentions of adverse drug reactions in Twitter posts
GNU General Public License v3.0
33 stars 16 forks source link

Input data when predicting labels for scores #3

Closed mvallet91 closed 6 years ago

mvallet91 commented 6 years ago

Hello! I'm running the code in an attempt to reproduce and learn from your experiment, I am following the process without changing anything. I am using a Colaboratory environment, with Python 2.7 and GPU acceleration, which is able to install the correct libraries for the setup.

However, when calculating the score (after training and selecting the best model, which works perfectly) , I get the following error:

ValueErrorTraceback (most recent call last) /content/drive/twitter-adr-blstm-master/adr_label_2.py in <module>() 469 scores, history, best_model = run_model_fixedembed(dataset, opts.numhidden, opts.hiddendim, idx2word, 470 idx2label, w2v, opts.basedir, validate=validate, --> 471 num_epochs=opts.nbepochs) 472 473 ## Retrieve scores

/content/drive/twitter-adr-blstm-master/adr_label_2.py in run_model_fixedembed(dataset, numhidden, hiddendim, idx2word, idx2label, w2v, basedir, validate, num_epochs) 346 347 scores = predict_score(model, test_lex, test_toks, test_y, os.path.join(basedir, 'predictions'), idx2label, --> 348 maxlen, fileprefix=fileprefix) 349 350 scores['val_f1'] = val_f1

/content/drive/twitter-adr-blstm-master/adr_label_2.py in predict_score(model, x, toks, y, pred_dir, i2l, padlen, metafile, fileprefix) 117 118 ## GRAPH (BIDIRECTIONAL) --> 119 pred_probs = model.predict({'input': x}, verbose=0)['output'] 120 test_loss = model.evaluate({'input': x, 'output': y}, batch_size=1, verbose=0) 121 pred = np.argmax(pred_probs, axis=2)

/usr/local/lib/python2.7/dist-packages/keras/engine/training.pyc in predict(self, x, batch_size, verbose, steps) 1145 'argument.') 1146 # Validate user data. -> 1147 x, _, _ = self._standardize_user_data(x) 1148 if self.stateful: 1149 if x[0].shape[0] > batch_size and x[0].shape[0] % batch_size != 0:

/usr/local/lib/python2.7/dist-packages/keras/engine/training.pyc in _standardize_user_data(self, x, y, sample_weight, class_weight, check_array_lengths, batch_size) 747 feed_input_shapes, 748 check_batch_axis=False, # Don't enforce the batch size. --> 749 exception_prefix='input') 750 751 if y is not None:

/usr/local/lib/python2.7/dist-packages/keras/engine/training_utils.pyc in standardize_input_data(data, names, shapes, check_batch_axis, exception_prefix) 75 raise ValueError('No data provided for "' + e.args[0] + 76 '". Need data ' ---> 77 'for each key in: ' + str(names)) 78 elif isinstance(data, list): 79 if isinstance(data[0], list):

ValueError: No data provided for "bidirectional_1_input". Need data for each key in: [u'bidirectional_1_input']

Something that caught my attention is that in the approximateMatch module, the input for that same model.predict was updated to a new format 2 days ago, and it stayed the same for the adr_label.

Do you have any pointers on what could be going (or what I am doing) wrong in this step?

acocos commented 6 years ago

Thanks for highlighting this issue. The problem was exactly as you suspected -- the calls to predict and evaluate in predict_score weren't updated for the new Keras version. The new release reflects this fix.