NTMC-Community / MatchZoo

Facilitating the design, comparison and sharing of deep text matching models.
Apache License 2.0
3.83k stars 899 forks source link

loss:inf 问题 #737

Closed wang7liang closed 5 years ago

wang7liang commented 5 years ago

你好,我发现,当使用如下任务时 ranking_task = mz.tasks.Ranking(loss=mz.losses.RankCrossEntropyLoss(num_neg=2)) preprocessor = mz.preprocessors.BasicPreprocessor(fixed_length_left=100, fixed_length_right=100)

BasicPreprocessor中当fixed_length_left与fixed_length_right超过50左右,训练时loss就不收敛了,请问调什么参数能避免这个问题那

1/94 [..............................] - ETA: 4:12 - loss: inf 2/94 [..............................] - ETA: 2:18 - loss: nan 3/94 [..............................] - ETA: 1:40 - loss: nan 4/94 [>.............................] - ETA: 1:21 - loss: nan 5/94 [>.............................] - ETA: 1:09 - loss: nan 6/94 [>.............................] - ETA: 1:02 - loss: nan 7/94 [=>............................] - ETA: 56s - loss: nan 8/94 [=>............................] - ETA: 52s - loss: nan 9/94 [=>............................] - ETA: 48s - loss: nan

具体代码如下: model = mz.models.ConvKNRM() model.params['task'] = ranking_task model.params['input_shapes'] = preprocessor.context['input_shapes'] model.params['embedding_input_dim'] = preprocessor.context['vocab_size'] + 1 model.params['embedding_output_dim'] = 300 model.params['embedding_trainable'] = True model.params['filters'] = 128 model.params['conv_activation_func'] = 'relu' model.params['max_ngram'] = 3 model.params['use_crossmatch'] = True model.params['kernel_num'] = 18 model.params['sigma'] = 0.07 model.params['exact_sigma'] = 0.001 model.params['optimizer'] = 'adadelta' model.guess_and_fill_missing_params() model.build() model.compile()

train_generator = mz.DataGenerator(train_dp_processed, mode='pair', num_neg=num_neg, num_dup=num_dup, batch_size=32, shuffle=False) history = model.fit_generator(train_generator, epochs=2, callbacks=[evaluate], workers=30, use_multiprocessing=True)

uduse commented 5 years ago

Try increasing the number of epochs.

wang7liang commented 5 years ago

Try increasing the number of epochs. loss already nan at first step of first epoch. increasing the number of epochs looks useless. 1/94 [..............................] - ETA: 4:12 - loss: inf 2/94 [..............................] - ETA: 2:18 - loss: nan 3/94 [..............................] - ETA: 1:40 - loss: nan 4/94 [>.............................] - ETA: 1:21 - loss: nan

bwanglzu commented 5 years ago

Improving Deep Neural Networks: Hyperparameter tuning, Regularization and Optimization: coursera