Closed wangkuiyi closed 7 years ago
Here summarizes an idea from @helinwang and @emailweixu that changes the concepts listed in https://github.com/PaddlePaddle/Paddle/pull/1297 into the following:
For how to describe networks and how to use it for convenient training, testing, and inference/serving, please see following comments.
We use the 3-branch ranking model in this example. For your convenience, I copy-a-paste the model's topology as follows:
A -> f -\
Q -> f --> cost
B -> f -/
The following program trains the topology including the cost, and then use the sub-network in the trained topology in inference:
def f(in):
e = paddle.layer.embedding(in, parameter_name="embedding")
o = paddle.layer.softmax(e, parameter_name="semantic")
return o
# Create 3 topologies (subnets), they share parameters because all
# correspoinding layers have the same parameter names.
fA = f(paddle.layer.data(input_name="A"))
fB = f(paddle.layer.data(input_name="B"))
fQ = f(paddle.layer.data(input_name="Q"))
topology = paddle.layer.less_than(
paddle.layer.cross_entropy(fA, fQ),
paddle.layer.corss_entropy(fB, fQ))
# Derive parameters required in topology and create them in model.
parameters = paddle.parameters.create(topology)
# Estimate parameters used in topology from data.
paddle.train(topology, parameters, reader=read_ranking_model_data)
# Inference using fA (or fB or fC, as they share their parameters).
[testA, testB, testQ] = read_ranking_model_data()
print "The sematic-vector of testA: ", paddle.infer(fA, parameters, testA)
We use GAN in
this example. In the following example program, d0
and d1
correspond to the two networks in the following figure:
def G(in):
# over-simplified example as G has only one layers:
return paddle.layer.fc(in, parameter_name="G")
def D(in, parameters_mutable);
# again, over-simplified:
return paddle.layer.fc(in, parameters_name="D", parameters_mutable)
# Construct the first topology, which contains both D and G.
# By learning this topology, we update parameters of G.
d0 = paddle.layer.should_be_false(
D(G(paddle.layer.data()),
False)) # Don't update the parameter of D here.
# Construct a second topology d1, which contains only D. By
# training this topology, we update parameters of D. Note
# that d1 share parameters with d0.
d1 = paddle.layer.should_be_true(D(paddle.layer.data()))
# Create parameters from a list of multiple topologies (models) for
# the chance to share parameters between these topologies.
parameters = paddle.parameters.create([d0, d1])
# Iterative training of GAN.
for ...:
train(d0, parameters, reader=read_from_rng)
train(d1, parameters, reader=read_from_realistic_images)
# Use d1 for inference:
print "D thinks a batch of images are realistic ", infer(d1, parameters, read_mnist_images)
Maybe a parameter pool(parameters
in above code) and network topologies is a good abstraction?
To be trained Neural Network = a parameter pool + train network topology. Inference Neural Network = the same parameter pool + inference network topology.
Is the Model
or NeuralNetwork
an important concept?
Maybe instead of specifying which parameter not to update here:
d0 = paddle.layer.should_be_false(
D(G(paddle.layer.data()),
False)) # Don't update the parameter of D here.
We can specify in train
, which parameter to update, or by default update all.
train函数里面,添加 event_handler的callback
附之前讨论的代码:
def train_reader():
yield {'pixel': pixels, 'label': labels} # return a data batch.
# Observe callback is used for plotting or logging the training process.
# The type of event parameter could be various. The intermediate result for
# training is in event instance.
def callback(event):
if isinstance(event, FinishTrainOneBatch):
print event.pass_id, event.batch_id, "Cost = ", event.cost, "Error Rate = ", event.metric[0]
print "output layer's output is ", event.activation['output']
if event.batch_id % 1000 == 0: # Even, we could save check point during callback.
with open('check_point_%d' % event.batch_id, 'w') as stream:
optimizer.check_point(stream)
else:
pass
optimizer.train(train_reader=train_reader, test_reader=None, # Test reader shared the same
# format of train reader. Could be None if no test data.
cost=CrossEntropy(input=model.topology.output_layer, # the network's output layer.
label=DataReader("label")), # Label is get from data_reader's 'label' field.
metric=[ErrorRateMetric(input=model.topology.output_layer, label=DataReader("label"))], # same logic above
observe_callback=callback
)
Added issue for separating updater
and trainer
: https://github.com/PaddlePaddle/Paddle/issues/1319
if we need to put things about cost in a special namespace, like
paddle.layer.cost.cross_entropy
paddle.layer.cost.less_than
We'd thought that a DL framework should implement concepts like model and cost. But we realized that these are not flexible enough to describe deep learning problems. Instead, we need the concept network. For more about this derivation, please refer to https://github.com/PaddlePaddle/Paddle/issues/1311.
In this issue, we are going to figure out how should we build a network and its parameters, and how can we train a network and use part of it (the model) for inference/serving.