PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.17k stars 5.56k forks source link

Check failed: in_->ids #1847

Closed LoganZhou closed 7 years ago

LoganZhou commented 7 years ago

I am trying to combine some features in a training data. But an error occurred while training.

Paddle release a new version 0.9.0, you can get the install package in http://www.paddlepaddle.org
I0421 15:35:08.487975  1591 Util.cpp:154] commandline: /usr/local/Paddle-GPU/bin/../opt/paddle/bin/paddle_trainer --config=my_trainer_config.py --save_dir=./output/combined_feature --trainer_count=1 --log_period=1000 --dot_period=10 --num_passes=10 --use_gpu=true --show_parameter_stats_period=3000 
[WARNING 2017-04-21 15:35:11,494 networks.py:1438] `outputs` routine try to calculate network's inputs and outputs order. It might not work well.Please see follow log carefully.
[INFO 2017-04-21 15:35:11,496 networks.py:1466] The input order is [id, adjacent_id, train_spd, label_5min, label_10min, label_15min, label_20min, label_25min, label_30min, label_35min, label_40min, label_45min, label_50min, label_55min, label_60min, label_65min, label_70min, label_75min, label_80min, label_85min, label_90min, label_95min, label_100min, label_105min, label_110min, label_115min, label_120min]
[INFO 2017-04-21 15:35:11,497 networks.py:1472] The output order is [cost_5min, cost_10min, cost_15min, cost_20min, cost_25min, cost_30min, cost_35min, cost_40min, cost_45min, cost_50min, cost_55min, cost_60min, cost_65min, cost_70min, cost_75min, cost_80min, cost_85min, cost_90min, cost_95min, cost_100min, cost_105min, cost_110min, cost_115min, cost_120min]
I0421 15:35:11.517391  1591 Trainer.cpp:175] trainer mode: Normal
I0421 15:35:11.593731  1591 PyDataProvider2.cpp:243] loading dataprovider my_dataprovider::process
I0421 15:35:11.603772  1591 PyDataProvider2.cpp:243] loading dataprovider my_dataprovider::process
I0421 15:35:11.605520  1591 GradientMachine.cpp:135] Initing parameters..
I0421 15:35:11.875561  1591 GradientMachine.cpp:142] Init parameters done.
F0421 15:36:29.555508  1591 TableProjection.cpp:39] Check failed: in_->ids 
*** Check failure stack trace: ***
    @           0x93ca56  google::LogMessage::Fail()
    @           0x93c9a2  google::LogMessage::SendToLog()
    @           0x93c326  google::LogMessage::Flush()
    @           0x93f3c5  google::LogMessageFatal::~LogMessageFatal()
    @           0x5ebc45  paddle::TableProjection::forward()
    @           0x62dd09  paddle::MixedLayer::forward()
    @           0x6c9e60  paddle::NeuralNetwork::forward()
    @           0x6bc453  paddle::GradientMachine::forwardBackward()
    @           0x75786d  paddle::TrainerInternal::forwardBackwardBatch()
    @           0x757dec  paddle::TrainerInternal::trainOneBatch()
    @           0x752cf0  paddle::Trainer::trainOneDataBatch()
    @           0x7554ef  paddle::Trainer::trainOnePass()
    @           0x756900  paddle::Trainer::train()
    @           0x5c6913  main
    @     0x2b4237a83d1d  __libc_start_main
    @           0x5dcb41  (unknown)
/usr/local/Paddle-GPU/bin/paddle: line 109:  1591 Aborted                 (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

This is my network configuration:

################################### Parameter Configuaration #######################################
TERM_NUM = 24
FORECASTING_NUM = 24
NODE_NUM = 329
emb_size = 32
batch_size = 128 if not is_predict else 1
settings(
    batch_size=128,
    learning_rate=2e-3,
    learning_method=AdamOptimizer(),
    regularization=L2Regularization(8e-4),
    gradient_clipping_threshold=25)
################################### Algorithm Configuration ########################################

output_label = []

# node_id input(integer_value(NODE_NUM))
node_id = data_layer(name='id', size=NODE_NUM)
id_emb = embedding_layer(input=node_id, size=emb_size)
id_fc = fc_layer(input=id_emb, size=emb_size)

# adjacent_id input(sparse_binary_vector(NODE_NUM))
adjacent_id = data_layer(name='adjacent_id', size=NODE_NUM)
adjacent_id_emb = embedding_layer(input=adjacent_id, size=emb_size)
adjacent_id_fc = fc_layer(input=adjacent_id, size=emb_size)

# time_id input(integer_value(24))
time_id = data_layer(name='time_id', size=24)
time_id_emb = embedding_layer(input=adjacent_id, size=24)
time_id_fc = fc_layer(input=time_id_emb, size=24)

# train_data input(integer_value_sequence(TERM_NUM))
train_spd = data_layer(name='train_spd', size=TERM_NUM)
train_spd_emb = embedding_layer(input=train_spd, size=24)
train_spd_fc = fc_layer(input=train_spd_emb, size=24)

# combine feature
node_combined_feature = fc_layer(
                    input=[id_fc,adjacent_id_fc,time_id_fc,train_spd_fc],
                    size=128,
            act=TanhActivation()
)

for i in xrange(FORECASTING_NUM):
    # lstm network

    lstm = simple_lstm(
        input=node_combined_feature, size=128, lstm_cell_attr=ExtraAttr(drop_rate=0.25))

    lstm_max = pooling_layer(input=lstm, pooling_type=MaxPooling())

    score = fc_layer(input=lstm_max, size=4, act=SoftmaxActivation())
    if is_predict:
        maxid = maxid_layer(score)
        output_label.append(maxid)
    else:
        # Multi-task training.
        label = data_layer(name='label_%dmin' % ((i + 1) * 5), size=4)
        cls = classification_cost(
            input=score, name="cost_%dmin" % ((i + 1) * 5), label=label)
        output_label.append(cls)
outputs(output_label)
reyoung commented 7 years ago

The data layer's order is not same as the definition order of data layer.

See log:

The input order is [id, adjacent_id, train_spd, label_5min, label_10min, label_15min, label_20min, label_25min, label_30min, label_35min, label_40min, label_45min, label_50min, label_55min, label_60min, label_65min, label_70min, label_75min, label_80min, label_85min, label_90min, label_95min, label_100min, label_105min, label_110min, label_115min, label_120min]

Currently, it is recommend yield the a dictionary, see this

LoganZhou commented 7 years ago

Thanks for reply. In my dataprovider, I yield the dictionary like this:

settings.slots = {
                'id': integer_value(329),
                'adjacent_id': sparse_binary_vector(329),
                'time_id': integer_value(24),
                'train_spd': integer_value_sequence(TERM_NUM),
    }
for i in range(FORECASTING_NUM):
    settings.slots.update({'label_%dmin' % ((i + 1) * 5):integer_value(LABEL_VALUE_NUM)})

The dataprovider are as follow:

@provider(
    init_hook=initHook, cache=CacheType.CACHE_PASS_IN_MEM, should_shuffle=True)
def process(settings, file_name):
    # adjacent dict
    node_dict = get_adjacent_node_dict()

    speeds_file = open("speeds.csv", "r")
    # output_file = open("new_train_data.csv","w")

    # adjacent vector dict
    adjacent_node_vec = get_node_vec()

    for line in islice(speeds_file, 1, None):
        current_node = int(line.strip('\r\n').split(",")[0])
        print current_node
        speeds = map(int, line.rstrip('\r\n').split(",")[1:])
        # Get the max index.
        end_time = len(speeds)
        # time counter
        time_count = 0
        time_id = 0
        # 当前节点ID
        for i in range(TERM_NUM, end_time - FORECASTING_NUM):
            # train data
            pre_spd = map(int, speeds[i - TERM_NUM:i])

            # Integer value need predicting, values start from 0, so every one minus 1.
            fol_spd = [j - 1 for j in speeds[i:i + FORECASTING_NUM]]

            # Predicting label is missing, abandon the sample.
            if -1 in fol_spd:
                # counter ++
                time_count += 1
                # time id
                if (time_count > 12):
                    time_count = 0
                    time_id += 1
                if (time_id >= 24):
                    time_id = 0
                continue

            yield_dict = {
                    'id': current_node,
                    'adjacent_id': adjacent_node_vec.get(current_node),
                    'time_id': time_id,
                    'train_spd': pre_spd
            }
            for j in range(FORECASTING_NUM):
                yield_dict.update({'label_%dmin' % ((j + 1) * 5):fol_spd[j]})

            yield yield_dict

            time_count += 1
            # time id
            if (time_count > 12):
                time_count = 0
                time_id += 1
            if (time_id >= 24):
                time_id = 0
reyoung commented 7 years ago

embedding layer only accepts integer value input. but adjacent_id is a sparse binary vector.

LoganZhou commented 7 years ago

Thanks, but the error still occurred after i delete that embedding layer. I change the network like this:

adjacent_id = data_layer(name='adjacent_id', size=NODE_NUM)
adjacent_id_fc = fc_layer(input=adjacent_id, size=emb_size)

I am still puzzled by the log. The time id was missing.

reyoung commented 7 years ago

time_id = data_layer(name='time_id', size=24) time_id_emb = embedding_layer(input=adjacent_id, size=24) time_id_fc = fc_layer(input=time_id_emb, size=24)

it seems time_id is not used by any layer?

LoganZhou commented 7 years ago

node_combined_feature = fc_layer( input=[id_fc,adjacent_id_fc,time_id_fc,train_spd_fc], size=128, act=TanhActivation() )

I combined the feature in here.

LoganZhou commented 7 years ago

time_id = data_layer(name='time_id', size=24) time_id_emb = embedding_layer(input=adjacent_id, size=24) time_id_fc = fc_layer(input=time_id_emb, size=24)

I found the wrong place. I forgot to change the input of time id when copying code.