PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.09k stars 5.55k forks source link

SIGFPE错误的解决办法 #3960

Closed youan1 closed 6 years ago

youan1 commented 7 years ago

如题: 训练的时候会抱如下错误,看过去的解决办法,是通过改小batch_size来解决这个问题的,但是我们对训练速度有要求,不能够 把batch_size改的太小,是否还有别的办法?

另外,我们试过将激活函数改为BRelu,但还是出现同样的问题

Thu Sep 7 18:55:22 2017[1,36]: Aborted at 1504781722 (unix time) try "date -d @1504781722" if you are using GNU date Thu Sep 7 18:55:22 2017[1,36]:PC: @ 0x0 (unknown) Thu Sep 7 18:55:22 2017[1,36]: SIGFPE (@0x7f77fd251a41) received by PID 51092 (TID 0x7f78034a5700) from PID 18446744073661651521; stack trace: Thu Sep 7 18:55:22 2017[1,36]: @ 0x7f780307c160 (unknown) Thu Sep 7 18:55:22 2017[1,36]: @ 0x7f77fd251a41 mkl_blas_avx_sgemm_kernel_0 Thu Sep 7 18:55:24 2017[1,36]:./train.sh: line 239: 51092 Floating point exceptionpython27-gcc482/bin/python conf/trainer_config.conf Thu Sep 7 18:55:24 2017[1,36]:+ '[' 136 -ne 0 ']' Thu Sep 7 18:55:24 2017[1,36]:+ kill_pserver2_exit Thu Sep 7 18:55:24 2017[1,36]:+ ps aux Thu Sep 7 18:55:24 2017[1,36]:+ grep paddle_pserver2 Thu Sep 7 18:55:24 2017[1,36]:+ grep paddle_cluster_job Thu Sep 7 18:55:24 2017[1,36]:+ grep -v grep Thu Sep 7 18:55:24 2017[1,36]:+ cut -c10-14 Thu Sep 7 18:55:24 2017[1,36]:+ xargs kill -9 Thu Sep 7 18:55:24 2017[1,36]:+ log_fatal 'paddle_trainer failed kill paddle_pserver2 and exit' Thu Sep 7 18:55:24 2017[1,36]:+ echo '[./common.sh : 399] [kill_pserver2_exit]' Thu Sep 7 18:55:24 2017[1,36]:[./common.sh : 399] [kill_pserver2_exit] Thu Sep 7 18:55:24 2017[1,36]:+ echo '[FATAL]: paddle_trainer failed kill paddle_pserver2 and exit' Thu Sep 7 18:55:24 2017[1,36]:[FATAL]: paddle_trainer failed kill paddle_pserver2 and exit Thu Sep 7 18:55:24 2017[1,36]:+ get_stack Thu Sep 7 18:55:24 2017[1,36]:+ set +x Thu Sep 7 18:55:24 2017[1,36]:

typhoonzero commented 7 years ago

看起来是v2的job,可以尝试在event_handler输出gradient和parameter,在日志中观察变化的情况,提供超参调整和模型调整的一些依据。

youan1 commented 7 years ago

是V2的job,就是说如果梯度太大,目前的解决办法还是只能通过调整超参数来解决,是吧,有哪些超参数可以在不降低训练速度和效果的情况下得到调整,batch_size会减慢速度把

youan1 commented 7 years ago

还是这个问题,能不能请paddle的同学更新一下代码,如果梯度太大了,超过阈值,做一下截断

typhoonzero commented 7 years ago

参考下:https://github.com/PaddlePaddle/Paddle/issues/3944 and https://github.com/PaddlePaddle/Paddle/issues/2262

youan1 commented 7 years ago

参考了上面的解法,加入了error_clipping_threshold,还是不稳定,同样的网络结构,有的时候成功,有的时候失败,失败还是浮点数溢出的错误,唯一变化的就是使用机器的节点数,请问还有什么别的方法能保证模型稳定产出

kuke commented 7 years ago

加入error_clipping_threshold的时候,合适的位置以及阈值很重要,可以作相应调节看看。

youan1 commented 7 years ago

所有的层都加了,阈值也比较小,30了,已经导致AUC下降了,而且这个不稳定,有时候嫩能过去,有时候不能过去

kuke commented 7 years ago

不稳定的原因可能是超参设置的问题,例如可以尝试一下减小学习率

youan1 commented 7 years ago

减少学习率,AUC会下降,还有别的方法么

zhuantouer commented 6 years ago
python train.py
I1107 18:23:17.444944 11700 Util.cpp:166] commandline:  --use_gpu=False --trainer_count=12
W1107 18:23:17.444993 11700 CpuId.h:112] PaddlePaddle wasn't compiled to use avx instructions, but these are available on your machine and could speed up CPU computations via CMAKE .. -DWITH_AVX=ON
I1107 18:23:17.540668 11700 GradientMachine.cpp:85] Initing parameters..
I1107 18:23:17.684885 11700 GradientMachine.cpp:92] Init parameters done.
Pass 0, Batch 0, Cost 19.568821, {'__sum_evaluator_0__': 0.5740799903869629}

Test with Pass 0, Batch 0, {'__sum_evaluator_0__': 0.5616281628608704}
Pass 0, Batch 2, Cost 60.015236, {'__sum_evaluator_0__': 0.1767834722995758}
Thread [140145163896576] Forwarding __lstmemory_1__,
*** Aborted at 1510050215 (unix time) try "date -d @1510050215" if you are using GNU date ***

PC: @                0x0 (unknown)

*** SIGFPE (@0x7f7641980eae) received by PID 11700 (TID 0x7f7616b60700) from PID 1100484270; stack trace: ***

    @     0x7f767a079160 (unknown)

    @     0x7f7641980eae paddle::LstmCompute::backwardOneSequence<>()

    @     0x7f76419811fd paddle::LstmCompute::backwardBatch<>()

    @     0x7f764197dd06 paddle::LstmLayer::backwardBatch()

    @     0x7f764197e39e paddle::LstmLayer::backward()

    @     0x7f7641a088d1 paddle::NeuralNetwork::backward()

    @     0x7f7641a12fd2 paddle::TrainerThread::backward()

    @     0x7f7641a1316d paddle::TrainerThread::computeThread()

    @     0x7f766a59f8a0 execute_native_thread_routine

    @     0x7f767a0711c3 start_thread

    @     0x7f767969912d __clone

    @                0x0 (unknown)
Floating point exception (core dumped)

为什么paddle总是出现浮点异常。。。

lcy-seso commented 6 years ago

为什么paddle总是出现浮点异常

计算程序遇到计算溢出本身是挺正常的,特别是对 LSTM 这样的序列级别计算。从上面的日志看,cost 在增大??不知道LSTM层的具体配置(激活、初始化等)都是什么,一般都会先简单调下参数看是否会有所改善。

zhuantouer commented 6 years ago

@lcy-seso 调整了batch size的大小,训练了8k个batch,又挂了。 这是全部代码,根据官方Semantic Role Labeling改的,改动的地方:

  1. 改用char embedding;
  2. 去掉了其他feature,只用word作为输入,label作为输出;
  3. depth改为2;
  4. 自己的reader,返回char_id,和 tag_id
  5. 加入了drop_rate
  6. batch 改为了32

辛苦paddle同学看一下~

# -*- coding: utf-8 -*-
import math, os
import numpy as np
from paddle.trainer_config_helpers import *
import paddle.v2 as paddle
import paddle.v2.evaluator as evaluator
from data_utils import load_vocab, get_char_ids, get_tag_ids

# dict
word_dict = load_vocab('./dict/char.dict', 'gb18030', True, True)
label_dict = load_vocab('./dict/tag.dict', 'gb18030', False, False)
black_dict = load_vocab('./dict/black.dict', 'gb18030', False, False)

def my_data_reader(file_path):
    def reader():
        with open(file_path, 'r') as fdata:
            char_ids, tag_ids = [], []
            for line in fdata:
                line = line.decode('gb18030', 'ignore').strip()
                # ....
                yield char_ids, tag_ids

    return reader

word_dict_len = len(word_dict)
label_dict_len = len(label_dict)

word_dim = 32
hidden_dim = 512
default_std = 1 / math.sqrt(hidden_dim) / 3.0
mix_hidden_lr = 1e-3

def d_type(size):
    return paddle.data_type.integer_value_sequence(size)

def db_lstm():
    #8 features
    word = paddle.layer.data(name='word_data', type=d_type(word_dict_len))

    emb_para = paddle.attr.Param(name='emb', initial_std=0)
    std_0 = paddle.attr.Param(initial_std=0.)
    std_default = paddle.attr.Param(initial_std=default_std)

    word_input = [word]
    emb_layers = [
        paddle.layer.embedding(size=word_dim, input=x, param_attr=emb_para)
        for x in word_input
    ]
    hidden_0 = paddle.layer.mixed(
        size=hidden_dim,
        bias_attr=std_default,
        input=[
            paddle.layer.full_matrix_projection(
                input=emb, param_attr=std_default) for emb in emb_layers
        ])

    lstm_para_attr = paddle.attr.Param(initial_std=0.0, learning_rate=1.0)
    layer_attr = ExtraLayerAttribute(drop_rate=0.5)
    hidden_para_attr = paddle.attr.Param(
        initial_std=default_std, learning_rate=mix_hidden_lr)

    lstm_0 = paddle.layer.lstmemory(
        input=hidden_0,
        act=paddle.activation.Relu(),
        gate_act=paddle.activation.Sigmoid(),
        state_act=paddle.activation.Sigmoid(),
        bias_attr=std_0,
        param_attr=lstm_para_attr,
        layer_attr=layer_attr)

    #stack L-LSTM and R-LSTM with direct edges
    input_tmp = [hidden_0, lstm_0]
    depth = 2
    for i in range(1, depth):
        mix_hidden = paddle.layer.mixed(
            size=hidden_dim,
            bias_attr=std_default,
            input=[
                paddle.layer.full_matrix_projection(
                    input=input_tmp[0], param_attr=hidden_para_attr),
                paddle.layer.full_matrix_projection(
                    input=input_tmp[1], param_attr=lstm_para_attr)
            ])

        lstm = paddle.layer.lstmemory(
            input=mix_hidden,
            act=paddle.activation.Relu(),
            gate_act=paddle.activation.Sigmoid(),
            state_act=paddle.activation.Sigmoid(),
            reverse=((i % 2) == 1),
            bias_attr=std_0,
            param_attr=lstm_para_attr,
            layer_attr=layer_attr)

        input_tmp = [mix_hidden, lstm]

    feature_out = paddle.layer.mixed(
        size=label_dict_len,
        bias_attr=std_default,
        input=[
            paddle.layer.full_matrix_projection(
                input=input_tmp[0], param_attr=hidden_para_attr),
            paddle.layer.full_matrix_projection(
                input=input_tmp[1], param_attr=lstm_para_attr)
        ], )

    return feature_out

def main():
    paddle.init(use_gpu=False, trainer_count=48)

    # define network topology
    feature_out = db_lstm()
    target = paddle.layer.data(name='target', type=d_type(label_dict_len))
    crf_cost = paddle.layer.crf(
        size=label_dict_len,
        input=feature_out,
        label=target,
        param_attr=paddle.attr.Param(
            name='crfw', initial_std=default_std, learning_rate=mix_hidden_lr))

    crf_dec = paddle.layer.crf_decoding(
        size=label_dict_len,
        input=feature_out,
        label=target,
        param_attr=paddle.attr.Param(name='crfw'))
    evaluator.sum(input=crf_dec)

    #inference_topology = paddle.topology.Topology(layers=crf_dec)
    #with open("inference_topology.pkl", 'wb') as f:
    #    inference_topology.serialize_for_inference(f)

    # create parameters
    parameters = paddle.parameters.create(crf_cost)

    # create optimizer
    optimizer = paddle.optimizer.Momentum(
        momentum=0,
        learning_rate=2e-2,
        regularization=paddle.optimizer.L2Regularization(rate=8e-4),
        model_average=paddle.optimizer.ModelAverage(
            average_window=0.5, max_average_window=10000), )

    trainer = paddle.trainer.SGD(
        cost=crf_cost,
        parameters=parameters,
        update_equation=optimizer,
        extra_layers=crf_dec)

    reader = paddle.batch(
        paddle.reader.shuffle(my_data_reader('./data/data.train'), buf_size=8192), batch_size=32)

    test_reader = paddle.batch(
        paddle.reader.shuffle(my_data_reader('./data/data.dev'), buf_size=8192), batch_size=32)

    feeding = {
        'word_data': 0,
        'target': 1
    }

    def event_handler(event):
        if isinstance(event, paddle.event.EndIteration):
            if event.batch_id % 2 == 0:
                print "Pass %d, Batch %d, Cost %f, %s" % (
                    event.pass_id, event.batch_id, event.cost, event.metrics)
            if event.batch_id % 1000 == 0:
                result = trainer.test(reader=test_reader, feeding=feeding)
                print "\nTest with Pass %d, Batch %d, %s" % (
                    event.pass_id, event.batch_id, result.metrics)

        if isinstance(event, paddle.event.EndPass):
            # save parameters
            with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
                trainer.save_parameter_to_tar(f)

            result = trainer.test(reader=test_reader, feeding=feeding)
            print "\nTest with Pass %d, %s" % (event.pass_id, result.metrics)

    trainer.train(
        reader=reader,
        event_handler=event_handler,
        num_passes=80,
        feeding=feeding)

if __name__ == '__main__':
    main()