Closed LoganZhou closed 7 years ago
The data layer's order is not same as the definition order of data layer.
See log:
The input order is [id, adjacent_id, train_spd, label_5min, label_10min, label_15min, label_20min, label_25min, label_30min, label_35min, label_40min, label_45min, label_50min, label_55min, label_60min, label_65min, label_70min, label_75min, label_80min, label_85min, label_90min, label_95min, label_100min, label_105min, label_110min, label_115min, label_120min]
Currently, it is recommend yield the a dictionary, see this
Thanks for reply. In my dataprovider, I yield the dictionary like this:
settings.slots = {
'id': integer_value(329),
'adjacent_id': sparse_binary_vector(329),
'time_id': integer_value(24),
'train_spd': integer_value_sequence(TERM_NUM),
}
for i in range(FORECASTING_NUM):
settings.slots.update({'label_%dmin' % ((i + 1) * 5):integer_value(LABEL_VALUE_NUM)})
The dataprovider are as follow:
@provider(
init_hook=initHook, cache=CacheType.CACHE_PASS_IN_MEM, should_shuffle=True)
def process(settings, file_name):
# adjacent dict
node_dict = get_adjacent_node_dict()
speeds_file = open("speeds.csv", "r")
# output_file = open("new_train_data.csv","w")
# adjacent vector dict
adjacent_node_vec = get_node_vec()
for line in islice(speeds_file, 1, None):
current_node = int(line.strip('\r\n').split(",")[0])
print current_node
speeds = map(int, line.rstrip('\r\n').split(",")[1:])
# Get the max index.
end_time = len(speeds)
# time counter
time_count = 0
time_id = 0
# 当前节点ID
for i in range(TERM_NUM, end_time - FORECASTING_NUM):
# train data
pre_spd = map(int, speeds[i - TERM_NUM:i])
# Integer value need predicting, values start from 0, so every one minus 1.
fol_spd = [j - 1 for j in speeds[i:i + FORECASTING_NUM]]
# Predicting label is missing, abandon the sample.
if -1 in fol_spd:
# counter ++
time_count += 1
# time id
if (time_count > 12):
time_count = 0
time_id += 1
if (time_id >= 24):
time_id = 0
continue
yield_dict = {
'id': current_node,
'adjacent_id': adjacent_node_vec.get(current_node),
'time_id': time_id,
'train_spd': pre_spd
}
for j in range(FORECASTING_NUM):
yield_dict.update({'label_%dmin' % ((j + 1) * 5):fol_spd[j]})
yield yield_dict
time_count += 1
# time id
if (time_count > 12):
time_count = 0
time_id += 1
if (time_id >= 24):
time_id = 0
embedding layer only accepts integer value
input. but adjacent_id
is a sparse binary vector.
Thanks, but the error still occurred after i delete that embedding layer. I change the network like this:
adjacent_id = data_layer(name='adjacent_id', size=NODE_NUM)
adjacent_id_fc = fc_layer(input=adjacent_id, size=emb_size)
I am still puzzled by the log. The time id was missing.
time_id = data_layer(name='time_id', size=24) time_id_emb = embedding_layer(input=adjacent_id, size=24) time_id_fc = fc_layer(input=time_id_emb, size=24)
it seems time_id is not used by any layer?
node_combined_feature = fc_layer( input=[id_fc,adjacent_id_fc,time_id_fc,train_spd_fc], size=128, act=TanhActivation() )
I combined the feature in here.
time_id = data_layer(name='time_id', size=24) time_id_emb = embedding_layer(input=adjacent_id, size=24) time_id_fc = fc_layer(input=time_id_emb, size=24)
I found the wrong place. I forgot to change the input of time id when copying code.
I am trying to combine some features in a training data. But an error occurred while training.
This is my network configuration: