NTMC-Community / MatchZoo

Facilitating the design, comparison and sharing of deep text matching models.
Apache License 2.0
3.82k stars 898 forks source link

Classificationo task raise bug when sorting #758

Closed bwanglzu closed 4 years ago

bwanglzu commented 5 years ago

image

I'll upload a notebook later.

bwanglzu commented 5 years ago
import matchzoo as mz
import keras
train_pack = mz.datasets.quora_qp.load_data('train', task='classification')
predict_pack = mz.datasets.quora_qp.load_data('test', task='classification')
Downloading data from https://firebasestorage.googleapis.com/v0/b/mtl-sentence-representations.appspot.com/o/data%2FQQP.zip?alt=media&token=700c6acf-160d-4d89-81d1-de4191d02cb5
60538880/60534884 [==============================] - 102s 2us/step

b'Skipping line 83032: expected 6 fields, saw 7\n'
b'Skipping line 154657: expected 6 fields, saw 7\n'
b'Skipping line 323916: expected 6 fields, saw 7\n'
/home/bo/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py:3265: DtypeWarning: Columns (1) have mixed types. Specify dtype option on import or set low_memory=False.
  exec(code_obj, self.user_global_ns, self.user_ns)
preprocessor = mz.preprocessors.BasicPreprocessor()
train_processed = preprocessor.fit_transform(train_pack)
predict_processed = preprocessor.transform(predict_pack)
Processing text_left with chain_transform of Tokenize => Lowercase => PuncRemoval: 100%|██████████| 293839/293839 [00:39<00:00, 7374.94it/s]
Processing text_right with chain_transform of Tokenize => Lowercase => PuncRemoval: 100%|██████████| 273121/273121 [00:37<00:00, 7196.97it/s]
Processing text_right with append: 100%|██████████| 273121/273121 [00:00<00:00, 1137625.53it/s]
Building FrequencyFilter from a datapack.: 100%|██████████| 273121/273121 [00:01<00:00, 202091.90it/s]
Processing text_right with transform: 100%|██████████| 273121/273121 [00:01<00:00, 227279.27it/s]
Processing text_left with extend: 100%|██████████| 293839/293839 [00:00<00:00, 1122696.41it/s]
Processing text_right with extend: 100%|██████████| 273121/273121 [00:00<00:00, 1138594.55it/s]
Building Vocabulary from a datapack.: 100%|██████████| 6284781/6284781 [00:01<00:00, 3936116.45it/s]
Processing text_left with chain_transform of Tokenize => Lowercase => PuncRemoval: 100%|██████████| 293839/293839 [00:40<00:00, 7183.37it/s]
Processing text_right with chain_transform of Tokenize => Lowercase => PuncRemoval: 100%|██████████| 273121/273121 [00:37<00:00, 7326.03it/s]
Processing text_right with transform: 100%|██████████| 273121/273121 [00:01<00:00, 190440.27it/s]
Processing text_left with transform: 100%|██████████| 293839/293839 [00:01<00:00, 178447.08it/s]
Processing text_right with transform: 100%|██████████| 273121/273121 [00:01<00:00, 161849.92it/s]
Processing length_left with len: 100%|██████████| 293839/293839 [00:00<00:00, 1111204.36it/s]
Processing length_right with len: 100%|██████████| 273121/273121 [00:00<00:00, 1092705.96it/s]
Processing text_left with transform: 100%|██████████| 293839/293839 [00:02<00:00, 123868.05it/s]
Processing text_right with transform: 100%|██████████| 273121/273121 [00:02<00:00, 120905.06it/s]
Processing text_left with chain_transform of Tokenize => Lowercase => PuncRemoval: 100%|██████████| 290826/290826 [00:41<00:00, 6942.42it/s]
Processing text_right with chain_transform of Tokenize => Lowercase => PuncRemoval: 100%|██████████| 307165/307165 [00:46<00:00, 6642.60it/s]
Processing text_right with transform: 100%|██████████| 307165/307165 [00:01<00:00, 165919.12it/s]
Processing text_left with transform: 100%|██████████| 290826/290826 [00:02<00:00, 130320.17it/s]
Processing text_right with transform: 100%|██████████| 307165/307165 [00:01<00:00, 259078.01it/s]
Processing length_left with len: 100%|██████████| 290826/290826 [00:00<00:00, 1173532.07it/s]
Processing length_right with len: 100%|██████████| 307165/307165 [00:00<00:00, 1211420.60it/s]
Processing text_left with transform: 100%|██████████| 290826/290826 [00:02<00:00, 114361.27it/s]
Processing text_right with transform: 100%|██████████| 307165/307165 [00:02<00:00, 103237.54it/s]
classification_task = mz.tasks.Classification(num_classes=2)
model = mz.models.DUET()
model.params.update(preprocessor.context)
model.params['task'] = classification_task
model.params['embedding_output_dim'] = 300
model.params['lm_filters'] = 32
model.params['lm_hidden_sizes'] = [32]
model.params['dm_filters'] = 32
model.params['dm_kernel_size'] = 3
model.params['dm_d_mpool'] = 4
model.params['dm_hidden_sizes'] = [32]
model.params['dropout_rate'] = 0.5
optimizer = keras.optimizers.Adamax(lr=0.0001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0)
model.params['optimizer'] = 'adagrad'
model.guess_and_fill_missing_params()
model.build()
model.compile()
model.backend.summary()
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
text_left (InputLayer)          (None, 30)           0                                            
__________________________________________________________________________________________________
text_right (InputLayer)         (None, 30)           0                                            
__________________________________________________________________________________________________
embedding (Embedding)           (None, 30, 300)      19189500    text_left[0][0]                  
                                                                 text_right[0][0]                 
__________________________________________________________________________________________________
conv1d_2 (Conv1D)               (None, 30, 32)       28832       embedding[0][0]                  
__________________________________________________________________________________________________
dropout_3 (Dropout)             (None, 30, 32)       0           conv1d_2[0][0]                   
__________________________________________________________________________________________________
conv1d_3 (Conv1D)               (None, 30, 32)       28832       embedding[1][0]                  
__________________________________________________________________________________________________
max_pooling1d_1 (MaxPooling1D)  (None, 1, 32)        0           dropout_3[0][0]                  
__________________________________________________________________________________________________
dropout_4 (Dropout)             (None, 30, 32)       0           conv1d_3[0][0]                   
__________________________________________________________________________________________________
reshape_2 (Reshape)             (None, 32)           0           max_pooling1d_1[0][0]            
__________________________________________________________________________________________________
max_pooling1d_2 (MaxPooling1D)  (None, 7, 32)        0           dropout_4[0][0]                  
__________________________________________________________________________________________________
lambda_1 (Lambda)               (None, 30, 30)       0           text_left[0][0]                  
                                                                 text_right[0][0]                 
__________________________________________________________________________________________________
dense_3 (Dense)                 (None, 32)           1056        reshape_2[0][0]                  
__________________________________________________________________________________________________
conv1d_4 (Conv1D)               (None, 7, 32)        1056        max_pooling1d_2[0][0]            
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               (None, 30, 32)       28832       lambda_1[0][0]                   
__________________________________________________________________________________________________
lambda_2 (Lambda)               (None, 1, 32)        0           dense_3[0][0]                    
__________________________________________________________________________________________________
dropout_5 (Dropout)             (None, 7, 32)        0           conv1d_4[0][0]                   
__________________________________________________________________________________________________
dropout_1 (Dropout)             (None, 30, 32)       0           conv1d_1[0][0]                   
__________________________________________________________________________________________________
lambda_3 (Lambda)               (None, 7, 32)        0           lambda_2[0][0]                   
                                                                 dropout_5[0][0]                  
__________________________________________________________________________________________________
reshape_1 (Reshape)             (None, 960)          0           dropout_1[0][0]                  
__________________________________________________________________________________________________
reshape_3 (Reshape)             (None, 224)          0           lambda_3[0][0]                   
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 32)           30752       reshape_1[0][0]                  
__________________________________________________________________________________________________
dense_4 (Dense)                 (None, 32)           7200        reshape_3[0][0]                  
__________________________________________________________________________________________________
dropout_2 (Dropout)             (None, 32)           0           dense_1[0][0]                    
__________________________________________________________________________________________________
dropout_6 (Dropout)             (None, 32)           0           dense_4[0][0]                    
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 1)            33          dropout_2[0][0]                  
__________________________________________________________________________________________________
dense_5 (Dense)                 (None, 1)            33          dropout_6[0][0]                  
__________________________________________________________________________________________________
add_1 (Add)                     (None, 1)            0           dense_2[0][0]                    
                                                                 dense_5[0][0]                    
__________________________________________________________________________________________________
dense_6 (Dense)                 (None, 2)            4           add_1[0][0]                      
==================================================================================================
Total params: 19,316,130
Trainable params: 19,316,130
Non-trainable params: 0
__________________________________________________________________________________________________
print("loading embedding ...")
glove_embedding = mz.datasets.embeddings.load_glove_embedding(dimension=300)
print("embedding loaded as `glove_embedding`")

embedding_matrix = glove_embedding.build_matrix(preprocessor.context['vocab_unit'].state['term_index'])
model.load_embedding_matrix(embedding_matrix)
loading embedding ...
embedding loaded as `glove_embedding`
pred_x, pred_y = predict_processed[:].unpack()
evaluate = mz.callbacks.EvaluateAllMetrics(model, x=pred_x, y=pred_y, batch_size=32)
train_generator = mz.DataGenerator(
    train_processed,
    mode='pair',
    num_dup=1,
    num_neg=1,
    batch_size=32
)
print('num batches:', len(train_generator))
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-31-2bc509c0414c> in <module>
      4     num_dup=1,
      5     num_neg=1,
----> 6     batch_size=32
      7 )
      8 print('num batches:', len(train_generator))

/usr/local/lib/python3.6/dist-packages/matchzoo/data_generator/data_generator.py in __init__(self, data_pack, mode, num_dup, num_neg, resample, batch_size, shuffle, callbacks)
    113                 data_pack.relation,
    114                 num_dup=num_dup,
--> 115                 num_neg=num_neg
    116             )
    117 

/usr/local/lib/python3.6/dist-packages/matchzoo/data_generator/data_generator.py in _reorganize_pair_wise(cls, relation, num_dup, num_neg)
    279         pairs = []
    280         groups = relation.sort_values(
--> 281             'label', ascending=False).groupby('id_left')
    282         for idx, group in groups:
    283             labels = group.label.unique()

~/.local/lib/python3.6/site-packages/pandas/core/frame.py in sort_values(self, by, axis, ascending, inplace, kind, na_position)
   4723 
   4724             indexer = nargsort(k, kind=kind, ascending=ascending,
-> 4725                                na_position=na_position)
   4726 
   4727         new_data = self._data.take(indexer,

~/.local/lib/python3.6/site-packages/pandas/core/sorting.py in nargsort(items, kind, ascending, na_position)
    271         non_nans = non_nans[::-1]
    272         non_nan_idx = non_nan_idx[::-1]
--> 273     indexer = non_nan_idx[non_nans.argsort(kind=kind)]
    274     if not ascending:
    275         indexer = indexer[::-1]

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
history = model.fit_generator(train_generator, epochs=30, callbacks=[evaluate], workers=30, use_multiprocessing=True)
Epoch 1/30
   15/11350 [..............................] - ETA: 1:05:34 - loss: 0.6261

Process ForkPoolWorker-58:
Process ForkPoolWorker-56:
Process ForkPoolWorker-59:
Process ForkPoolWorker-54:
Process ForkPoolWorker-42:
Process ForkPoolWorker-39:
Process ForkPoolWorker-37:
Process ForkPoolWorker-55:
Process ForkPoolWorker-57:
Process ForkPoolWorker-36:
Process ForkPoolWorker-44:
Process ForkPoolWorker-60:
Process ForkPoolWorker-41:
Process ForkPoolWorker-35:
Traceback (most recent call last):
Traceback (most recent call last):
Process ForkPoolWorker-38:
Process ForkPoolWorker-33:
Process ForkPoolWorker-53:
Process ForkPoolWorker-43:
Process ForkPoolWorker-51:
Traceback (most recent call last):
Process ForkPoolWorker-52:
Process ForkPoolWorker-40:
Process ForkPoolWorker-34:
Process ForkPoolWorker-31:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
Process ForkPoolWorker-32:
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Process ForkPoolWorker-46:
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Traceback (most recent call last):
Process ForkPoolWorker-47:
Traceback (most recent call last):
Process ForkPoolWorker-50:
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
Process ForkPoolWorker-45:
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
Process ForkPoolWorker-48:
Process ForkPoolWorker-49:
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
Traceback (most recent call last):
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 335, in get
    res = self._reader.recv_bytes()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 216, in recv_bytes
    buf = self._recv_bytes(maxlength)
KeyboardInterrupt
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
KeyboardInterrupt
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
KeyboardInterrupt
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 407, in _recv_bytes
    buf = self._recv(4)
KeyboardInterrupt
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 108, in worker
    task = get()
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
KeyboardInterrupt
KeyboardInterrupt
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/connection.py", line 379, in _recv
    chunk = read(handle, remaining)
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
  File "/usr/lib/python3.6/multiprocessing/queues.py", line 334, in get
    with self._rlock:
KeyboardInterrupt
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
  File "/usr/lib/python3.6/multiprocessing/synchronize.py", line 95, in __enter__
    return self._semlock.__enter__()
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
KeyboardInterrupt
Traceback (most recent call last):
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()

---------------------------------------------------------------------------

KeyboardInterrupt                         Traceback (most recent call last)

<ipython-input-32-fb49b023fbc1> in <module>
----> 1 history = model.fit_generator(train_generator, epochs=30, callbacks=[evaluate], workers=30, use_multiprocessing=True)

/usr/local/lib/python3.6/dist-packages/matchzoo/engine/base_model.py in fit_generator(self, generator, epochs, verbose, **kwargs)
    274             generator=generator,
    275             epochs=epochs,
--> 276             verbose=verbose, **kwargs
    277         )
    278 

~/.local/lib/python3.6/site-packages/keras/legacy/interfaces.py in wrapper(*args, **kwargs)
     89                 warnings.warn('Update your `' + object_name + '` call to the ' +
     90                               'Keras 2 API: ' + signature, stacklevel=2)
---> 91             return func(*args, **kwargs)
     92         wrapper._original_function = func
     93         return wrapper

~/.local/lib/python3.6/site-packages/keras/engine/training.py in fit_generator(self, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
   1416             use_multiprocessing=use_multiprocessing,
   1417             shuffle=shuffle,
-> 1418             initial_epoch=initial_epoch)
   1419 
   1420     @interfaces.legacy_generator_methods_support

~/.local/lib/python3.6/site-packages/keras/engine/training_generator.py in fit_generator(model, generator, steps_per_epoch, epochs, verbose, callbacks, validation_data, validation_steps, class_weight, max_queue_size, workers, use_multiprocessing, shuffle, initial_epoch)
    215                 outs = model.train_on_batch(x, y,
    216                                             sample_weight=sample_weight,
--> 217                                             class_weight=class_weight)
    218 
    219                 outs = to_list(outs)

~/.local/lib/python3.6/site-packages/keras/engine/training.py in train_on_batch(self, x, y, sample_weight, class_weight)
   1215             ins = x + y + sample_weights
   1216         self._make_train_function()
-> 1217         outputs = self.train_function(ins)
   1218         return unpack_singleton(outputs)
   1219 

~/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in __call__(self, inputs)
   2713                 return self._legacy_call(inputs)
   2714 
-> 2715             return self._call(inputs)
   2716         else:
   2717             if py_any(is_tensor(x) for x in inputs):

~/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py in _call(self, inputs)
   2673             fetched = self._callable_fn(*array_vals, run_metadata=self.run_metadata)
   2674         else:
-> 2675             fetched = self._callable_fn(*array_vals)
   2676         return fetched[:len(self.outputs)]
   2677 

/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py in __call__(self, *args, **kwargs)
   1437           ret = tf_session.TF_SessionRunCallable(
   1438               self._session._session, self._handle, args, status,
-> 1439               run_metadata_ptr)
   1440         if run_metadata:
   1441           proto_data = tf_session.TF_GetBuffer(run_metadata_ptr)

KeyboardInterrupt: 

  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/multiprocessing/pool.py", line 119, in worker
    result = (True, func(*args, **kwds))
  File "/home/bo/.local/lib/python3.6/site-packages/keras/utils/data_utils.py", line 401, in get_index
    return _SHARED_SEQUENCES[uid][i]
  File "/usr/local/lib/python3.6/dist-packages/matchzoo/data_generator/data_generator.py", line 132, in __getitem__
    batch_data_pack = self._data_pack[indices]
  File "/usr/local/lib/python3.6/dist-packages/matchzoo/data_pack/data_pack.py", line 168, in __getitem__
    left = self._left.loc[relation['id_left'].unique()]
  File "/home/bo/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1500, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/home/bo/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1902, in _getitem_axis
    return self._getitem_iterable(key, axis=axis)
  File "/home/bo/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1205, in _getitem_iterable
    raise_missing=False)
  File "/home/bo/.local/lib/python3.6/site-packages/pandas/core/indexing.py", line 1155, in _get_listlike_indexer
    keyarr = ax.reindex(keyarr)[0]
  File "/home/bo/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 3120, in reindex
    target = ensure_index(target)
  File "/home/bo/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 5378, in ensure_index
    return Index(index_like)
  File "/home/bo/.local/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 292, in __new__
    elif (is_datetime64_any_dtype(data) or
  File "/home/bo/.local/lib/python3.6/site-packages/pandas/core/dtypes/common.py", line 1114, in is_datetime64_any_dtype
    return (is_datetime64_dtype(arr_or_dtype) or
  File "/home/bo/.local/lib/python3.6/site-packages/pandas/core/dtypes/common.py", line 431, in is_datetime64_dtype
    return _is_dtype_type(arr_or_dtype, classes(np.datetime64))
  File "/home/bo/.local/lib/python3.6/site-packages/pandas/core/dtypes/common.py", line 119, in classes
    return lambda tipo: issubclass(tipo, klasses)
KeyboardInterrupt
relation = train_processed.relation
relation.head()
id_left id_right label
0 213221 213222.0 [1, 0]
1 536040 536041.0 [0, 1]
2 364011 490273.0 [1, 0]
3 155721 7256.0 [0, 1]
4 279958 279959.0 [1, 0]
relation.sort_values('label', ascending=False)
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-30-e379a48ad6e6> in <module>
----> 1 relation.sort_values('label', ascending=False)

~/.local/lib/python3.6/site-packages/pandas/core/frame.py in sort_values(self, by, axis, ascending, inplace, kind, na_position)
   4723 
   4724             indexer = nargsort(k, kind=kind, ascending=ascending,
-> 4725                                na_position=na_position)
   4726 
   4727         new_data = self._data.take(indexer,

~/.local/lib/python3.6/site-packages/pandas/core/sorting.py in nargsort(items, kind, ascending, na_position)
    271         non_nans = non_nans[::-1]
    272         non_nan_idx = non_nan_idx[::-1]
--> 273     indexer = non_nan_idx[non_nans.argsort(kind=kind)]
    274     if not ascending:
    275         indexer = indexer[::-1]

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
uduse commented 5 years ago

It's more like a pandas usage question. pandas's sort_values can't sort numpy arrays, that's why.

To fix that, create a helper column:

df['label_as_tuple'] = df['label'].apply(tuple)
df.sort_values('label_as_tuple')
uduse commented 4 years ago

I hope things are working well for you now. I’ll go ahead and close this issue, but I’m happy to continue further discussion whenever needed.

aszhanghuali commented 4 years ago

Hi!@bwanglzu @uduse I have met the same question! I hope I can get your help!Thx! Traceback (most recent call last): File "/home/zhl/MatchZoo-master/tutorials/untitled0.py", line 52, in train_generator = mz.PairDataGenerator(train_pack_processed, num_dup=1, num_neg=1, batch_size=20) File "/home/zhl/MatchZoo-master/matchzoo/contrib/legacy_data_generator.py", line 141, in init shuffle=shuffle, File "/home/zhl/MatchZoo-master/matchzoo/data_generator/data_generator.py", line 113, in init num_neg=num_neg File "/home/zhl/MatchZoo-master/matchzoo/data_generator/data_generator.py", line 279, in _reorganize_pair_wise 'label', ascending=False).groupby('id_left') File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/pandas/core/frame.py", line 5014, in sort_values k, kind=kind, ascending=ascending, na_position=na_position File "/root/anaconda3/envs/tensorflow/lib/python3.6/site-packages/pandas/core/sorting.py", line 260, in nargsort indexer = non_nan_idx[non_nans.argsort(kind=kind)] ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

uduse commented 4 years ago

@aszhanghuali Have you tried creating a helper column as I suggested? Please provide more information so I can investigate.