PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.09k stars 5.55k forks source link

How to describe and use Network #1315

Closed wangkuiyi closed 7 years ago

wangkuiyi commented 7 years ago

We'd thought that a DL framework should implement concepts like model and cost. But we realized that these are not flexible enough to describe deep learning problems. Instead, we need the concept network. For more about this derivation, please refer to https://github.com/PaddlePaddle/Paddle/issues/1311.

In this issue, we are going to figure out how should we build a network and its parameters, and how can we train a network and use part of it (the model) for inference/serving.

wangkuiyi commented 7 years ago

Here summarizes an idea from @helinwang and @emailweixu that changes the concepts listed in https://github.com/PaddlePaddle/Paddle/pull/1297 into the following:

  1. No concept of Model; instead, we introduce Network. The reason is listed in here.
  2. A Network consists of topology and parameters. But Network is not the essence; instead; topology and parameters are.
  3. Layers in the same network might share parameters, an example is in here, and
  4. Layers of different networks might share parameters too, as the GAN example will be presented later.

For how to describe networks and how to use it for convenient training, testing, and inference/serving, please see following comments.

wangkuiyi commented 7 years ago

Example 1. Sharing Parameters between Layers

We use the 3-branch ranking model in this example. For your convenience, I copy-a-paste the model's topology as follows:

A -> f -\
Q -> f --> cost
B -> f -/

The following program trains the topology including the cost, and then use the sub-network in the trained topology in inference:

def f(in):
    e = paddle.layer.embedding(in, parameter_name="embedding")
    o = paddle.layer.softmax(e, parameter_name="semantic")
    return o

# Create 3 topologies (subnets), they share parameters because all
# correspoinding layers have the same parameter names.
fA = f(paddle.layer.data(input_name="A"))
fB = f(paddle.layer.data(input_name="B"))
fQ = f(paddle.layer.data(input_name="Q"))

topology = paddle.layer.less_than(
               paddle.layer.cross_entropy(fA, fQ),
               paddle.layer.corss_entropy(fB, fQ))

# Derive parameters required in topology and create them in model.
parameters = paddle.parameters.create(topology)

# Estimate parameters used in topology from data.
paddle.train(topology, parameters, reader=read_ranking_model_data)

# Inference using fA (or fB or fC, as they share their parameters).
[testA, testB, testQ] = read_ranking_model_data()
print "The sematic-vector of testA: ", paddle.infer(fA, parameters, testA)
wangkuiyi commented 7 years ago

Exmaple 2. Sharing Parameters between "Models"

We use GAN in this example. In the following example program, d0 and d1 correspond to the two networks in the following figure:

def G(in):
    # over-simplified example as G has only one layers:
    return paddle.layer.fc(in, parameter_name="G") 

def D(in, parameters_mutable);
    # again, over-simplified:
    return paddle.layer.fc(in, parameters_name="D", parameters_mutable)

# Construct the first topology, which contains both D and G.
# By learning this topology, we update parameters of G.
d0 = paddle.layer.should_be_false(
         D(G(paddle.layer.data()),
           False)) # Don't update the parameter of D here.

# Construct a second topology d1, which contains only D. By
# training this topology, we update parameters of D.  Note 
# that d1 share parameters with d0.
d1 = paddle.layer.should_be_true(D(paddle.layer.data()))

# Create parameters from a list of multiple topologies (models) for
# the chance to share parameters between these topologies.
parameters = paddle.parameters.create([d0, d1])

# Iterative training of GAN.
for ...:
    train(d0, parameters, reader=read_from_rng)
    train(d1, parameters, reader=read_from_realistic_images)

# Use d1 for inference:
print "D thinks a batch of images are realistic ", infer(d1, parameters, read_mnist_images)
reyoung commented 7 years ago

Maybe a parameter pool(parameters in above code) and network topologies is a good abstraction?

To be trained Neural Network = a parameter pool + train network topology. Inference Neural Network = the same parameter pool + inference network topology.

Is the Model or NeuralNetwork an important concept?

helinwang commented 7 years ago

Maybe instead of specifying which parameter not to update here:

d0 = paddle.layer.should_be_false(
         D(G(paddle.layer.data()),
           False)) # Don't update the parameter of D here.

We can specify in train, which parameter to update, or by default update all.

reyoung commented 7 years ago

train函数里面,添加 event_handler的callback

附之前讨论的代码:

def train_reader():
    yield {'pixel': pixels, 'label': labels}  # return a data batch.

# Observe callback is used for plotting or logging the training process.
# The type of event parameter could be various. The intermediate result for 
# training is in event instance.
def callback(event):
     if isinstance(event, FinishTrainOneBatch):
        print event.pass_id, event.batch_id, "Cost = ", event.cost, "Error Rate = ", event.metric[0]
        print "output layer's output is ", event.activation['output']
        if event.batch_id % 1000 == 0:  # Even, we could save check point during callback.
            with open('check_point_%d' % event.batch_id, 'w') as stream:
                 optimizer.check_point(stream)
     else:
        pass

optimizer.train(train_reader=train_reader,  test_reader=None,  # Test reader shared the same 
                                                               # format of train reader. Could be None if no test data.
                cost=CrossEntropy(input=model.topology.output_layer,  # the network's output layer.
                                  label=DataReader("label")),  # Label is get from data_reader's 'label' field.
                metric=[ErrorRateMetric(input=model.topology.output_layer, label=DataReader("label"))], # same logic above
                observe_callback=callback
)
helinwang commented 7 years ago

Added issue for separating updater and trainer: https://github.com/PaddlePaddle/Paddle/issues/1319

jacquesqiao commented 7 years ago

if we need to put things about cost in a special namespace, like

paddle.layer.cost.cross_entropy
paddle.layer.cost.less_than
wangkuiyi commented 7 years ago

@reyoung For your comment, It seems that given the event_handler mechanism, we don't need to pass matrics to function train; instead, we can calculate those metrics in the event_handler and plot them in necessary?