NTMC-Community / MatchZoo

Facilitating the design, comparison and sharing of deep text matching models.
Apache License 2.0
3.83k stars 899 forks source link

Bug during training DRRM v2.1? #703

Closed datistiquo closed 5 years ago

datistiquo commented 5 years ago

It seems that the below usage is not allowed anymore in datapack.py? I just followed the DRRM turoial: https://github.com/NTMC-Community/MatchZoo/blob/master/tutorials/wikiqa/drmm.ipynb



________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
text_left (InputLayer)          (None, 10)           0
__________________________________________________________________________________________________
match_histogram (InputLayer)    (None, 10, 30)       0
__________________________________________________________________________________________________
embedding (Embedding)           (None, 10, 300)      1083000     text_left[0][0]
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 10, 10)       310         match_histogram[0][0]
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 10, 1)        300         embedding[0][0]
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 10, 10)       110         dense_2[0][0]
__________________________________________________________________________________________________
attention_mask (Lambda)         (None, 10, 1)        0           dense_1[0][0]
__________________________________________________________________________________________________
dense_4 (Dense)                 (None, 10, 10)       110         dense_3[0][0]
__________________________________________________________________________________________________
attention_probs (Lambda)        (None, 10, 1)        0           attention_mask[0][0]
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, 10, 1)        11          dense_4[0][0]
__________________________________________________________________________________________________
dot_1 (Dot)                     (None, 1, 1)         0           attention_probs[0][0]
                                                                 dense_5[0][0]
__________________________________________________________________________________________________
flatten_1 (Flatten)             (None, 1)            0           dot_1[0][0]
__________________________________________________________________________________________________
dense_6 (Dense)                 (None, 1)            2           flatten_1[0][0]
==================================================================================================
Total params: 1,083,843
Trainable params: 1,083,843
Non-trainable params: 0
__________________________________________________________________________________________________
2019-04-11 14:26:45.996754: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Epoch 1/30
C:\Users\\IB\MatchZoo-master\matchzoo\data_pack\data_pack.py:167: FutureWarning:
Passing list-likes to .loc or [] with any missing label will raise
KeyError in the future, you can use .reindex() as an alternative.

See the documentation here:
https://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
  relation = self._relation.loc[index].reset_index(drop=True)
  2/201 [..............................] - ETA: 6:42 - loss: 2.3240 Traceback (most recent call last):
  File "gen.py", line 86, in <module>
    history = model.fit_generator(train_generator, epochs=30, workers=4)
  File "C:\Users\\IB\MatchZoo-master\matchzoo\engine\base_model.py", line 276, in fit_generator
    verbose=verbose, **kwargs
  File "C:\Users\\Anaconda3\lib\site-packages\keras\legacy\interfaces.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\\Anaconda3\lib\site-packages\keras\engine\training.py", line 1418, in fit_generator
    initial_epoch=initial_epoch)
  File "C:\Users\\Anaconda3\lib\site-packages\keras\engine\training_generator.py", line 181, in fit_generator
    generator_output = next(output_generator)
  File "C:\Users\\Anaconda3\lib\site-packages\keras\utils\data_utils.py", line 601, in get
    six.reraise(*sys.exc_info())
  File "C:\Users\\Anaconda3\lib\site-packages\six.py", line 693, in reraise
    raise value
  File "C:\Users\\Anaconda3\lib\site-packages\keras\utils\data_utils.py", line 595, in get
    inputs = self.queue.get(block=True).get()
  File "C:\Users\\Anaconda3\lib\multiprocessing\pool.py", line 644, in get
    raise self._value
  File "C:\Users\\Anaconda3\lib\multiprocessing\pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\\Anaconda3\lib\site-packages\keras\utils\data_utils.py", line 401, in get_index
    return _SHARED_SEQUENCES[uid][i]
  File "C:\Users\\IB\MatchZoo-master\matchzoo\data_generator\data_generator.py", line 132, in __getitem__
    batch_data_pack = self._data_pack[indices]
  File "C:\Users\\IB\MatchZoo-master\matchzoo\data_pack\data_pack.py", line 168, in __getitem__
    left = self._left.loc[relation['id_left'].unique()]
  File "C:\Users\\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1478, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "C:\Users\\Anaconda3\lib\site-packages\pandas\core\indexing.py", line 1867, in _getitem_axis
    elif com.is_bool_indexer(key):
  File "C:\Users\\Anaconda3\lib\site-packages\pandas\core\common.py", line 107, in is_bool_indexer
    raise ValueError('cannot index with vector containing '
ValueError: cannot index with vector containing NA / NaN valu
```es
faneshion commented 5 years ago

Thanks for the feedback, I will check this at once.

datistiquo commented 5 years ago

Thanks.

In : train_generator = mz.DataGenerator(train_pack_processed, mode='pair', num_dup=1, num_neg=2, batch_size=20, callbacks=[hist_callback])

Must be num_neg equal to the number of negative examples per instance in my training data? In my data I have for each sample one positive and 2 negative examples, but some are duplicates so I might have per each instance 2 pos and 4 neagtive examples? How is this handled?

datistiquo commented 5 years ago

@faneshion I solved it. Need to learn the usage. num_neg parameters is important to match exactly your training data! Now it seems working...

faneshion commented 5 years ago

The number of neg_num is connected to the loss. You should set them as equal. More detailed information, please check the documentation of the function.

pluto-junzeng commented 5 years ago

@faneshion I solved it. Need to learn the usage. num_neg parameters is important to match exactly your training data! Now it seems working...

hei I have the same problem ,can you tell me .how to um_neg parameters is important to match exactly your training data

bwanglzu commented 5 years ago

@zengjunjun num_neg refers to train one positive example together with x negative examples.

In readme:

image

pluto-junzeng commented 5 years ago

Thanks