Closed wangkuiyi closed 6 years ago
Executor
should be hidden.train
function.For more details, let us take a look at the current example program fluid/test_fit_a_line.py
, which has the following structure:
Define the forward pass
Generate the backward pass
Create the reader
Run the startup program
Run the Python train-loop and calls the main program
This idea came from @emailweixu . Here is a brief description with example code.
Let us encapsulate the forward pass into a Python function:
def F():
x = fluid.layers.data(...)
...
avg_cost = fluid.layers.mean(...)
Let us invent a standard Fluid function fluid.train
, which encapsulates the creation of the reader, the train-loop, and the generation of backward pass:
def train(F, ...):
F() # fills in startup_program and main_program
exe = fluid.Executor(...)
exe.run(startup_program)
for iter in xrange(1000):
exe.run(main_program)
So, the users could rewrite test_fit_a_line.py
as
def F():
x = ...
...
avg_cost = ...
train(F, ...)
Following the proposal, an example train script could be
import paddle.fluid as paddle
import paddle.v2.dataset as dataset
def conv_network():
image = fluid.layers.data(name='image', shape=[1, 28, 28])
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
hidden = fluid.layers.simple_img_conv_pool(image,
num_filters=32, filter_size=3,
pool_size=3, pool_stride=1, act='relu')
hidden = fluid.layers.dropout(hidden, 0.1)
hidden = fluid.layers.batch_norm(hidden)
prediction = fluid.layers.fc(hidden, size=10, act='softmax')
loss = fluid.layers.cross_entropy(prediction, label)
return loss
def main():
trainer = fluid.Trainer(conv_network, optimizer=fluid.optimizer.SGD())
def event_handler(event):
if isinstance(event, fluid.EndIteration):
print event.metrics
elif isinstance(event, fluid.EndPass):
test_metrics = trainer.test(reader=dataset.mnist.test())
print test_metrics
trainer.train(reader=dataset.mnist.train(), num_pass=100)
if __name__ == '__main__':
main()
For common models, this skeleton looks good. (Still need more polish and thinking)
Overall, I think we need to have 2 level of APIs: high level and low level
high level API simplifies the network construction for normal models, such as ResNet, LSTM. high level API is built on top of low level APIs.
however, we need to be sure that user still has the flexibility of building complex models with our low level (more fine-grained) APIs.
One last point: When our design is more stable, we need ask our modeling team member (qingqing, yaming, yibing, etc) for advice. We need to make sure our API has a good coverage of current and future models.
I think the key problem makes current Fluid hard to use is that users can hardly understand our 'program'. Furthermore, in Fluid most features require more than one program. For example, if a user needs to do inference on test data every 10 training batches, he has to build and maintain two programs: the one for training and another one for test. Most users know neither why there should be two programs nor how to correctly build them.
In my view, the most exciting point of this issue's proposal is to warp user's net config in a function and then pass the function to some other objects. Based on this idea, maybe we can introduce a conception of ProgramBuilder
. A ProgramBuilder
takes a forwarding net config function defined by users(F()
in the demo), and adds complementary ops(optimizers, gradient ops...) to generate specific programs(training program, testing program, and so on). Programs are built and maintained by ProgramBuilder
automatically. A trainer can take a ProgramBuilder and execute the corresponding program.
In this method, users no longer need to understand programs, for they will not directly use them anymore.
By the way, in the proposed design, how to support GAN?
@JiayiFeng It seems that we need to allow users to write the train-loop. (I was taking the PyTorch version as a reference.) I am afraid that this simplified API cannot make it, and we might want it in the next milestone. What do you think?
Clearly, this high level API cannot satisfy all needs (e.g. reinforcement learning, GAN). The current V2 API cannot either. It might be possible to tweak a little bit (say, combining model and optimizer as one to pass to trainer) to make GAN possible. We need to think about to what level we can clean up the low-level API to support user train-loop in python.
@reyoung Do you have any suggestion on how Inference will work with the paradigm that you have shared? I am not sure if this API style will be compatible with the inference engine work done in Q1.
How about this? I think it can support GAN:
import paddle.fluid as fluid
import paddle.v2.dataset as dataset
def conv_network():
image = fluid.layers.data(name='image', shape=[1, 28, 28])
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
hidden = fluid.layers.simple_img_conv_pool(image,
num_filters=32, filter_size=3,
pool_size=3, pool_stride=1, act='relu')
hidden = fluid.layers.dropout(hidden, 0.1)
hidden = fluid.layers.batch_norm(hidden)
prediction = fluid.layers.fc(hidden, size=10, act='softmax')
loss = fluid.layers.cross_entropy(prediction, label)
return loss
def train_conv_network():
loss = conv_network()
sgd_optimizer = fluid.optimizer.SGD(learning_rate=0.001)
sgd_optimizer.minimize(loss)
return loss
def main():
# `fluid.Compile` creates a program,
# the program owns the program desc, and a single scope.
# Because the scope is shared by different methods (`conv_network`, `train_conv_network`),
# GAN should be supported.
program = fluid.Compile(conv_network, train_conv_network)
for i in range(0, 100):
for train_data in dataset.mnist.train():
loss = program.run("train_conv_network", {"image": train_data[0], "label": train_data[1]})
print("train loss", loss)
for test_data in dataset.mnist.test():
loss = program.run("conv_network", {"image": test_data[0], "label": test_data[1]})
print("test loss", loss)
if __name__ == '__main__':
main()
@helinwang how do your proposal handle distributed training?
@emailweixu for trainer, the fluid.Compile
can check the environment variable to distinguish if it's distributed training and produce the correct compiled program:
TRAINING_ROLE=TRAINER PSERVERS=127.0.0.1:8000 python train.py
For pserver, the user can do something like:
TRAINING_ROLE=PSERVER paddle run --file train.py --main train_conv_network
The key is that the entry point is no longer Python, instead it's the paddle
binary, which parses the train_conv_network
function into a pserver program, and run it.
@emailweixu maybe a simpler way to start pserver is:
TRAINING_ROLE=PSERVER python train.py
And now fluid.Compile
detects it's the PSERVER environment variable, produces a program
that program.run
will run the pserver operators.
@helinwang The problem is that pserver does not have a loop over data generator. With your design, user code has to do different things depends on whether it is pserver or trainer.
Awesome discussion. I have some naive thoughts, for some complicated networks, we should well handle the naming things. For example, fc
layers may appear anywhere, I think auto-naming mechanism is not enough, event that we can pass a specified parameter name, however I think we can design better.
with net.module('generator') as generator:
data1
...
with network.name_scope('scope') as sub_scope:
fc1 = fluid.layer.fc(...)
...
with net.module('discrimitor') as discrimitor:
data1
data2
...
fc2 = fluid.layer.fc(input=generator.sub_scope.fc1)
...
network.module
holds a complete logic block. We may analysis the dependencies to decide whether compile one ProgramDesc or more one ProgramDesc. We can require that all computation logic within a module or name_scope will share a naming space.
@pkuyym
Acutally, fluid supports name scope right now. fluid.unique_name.guard()
. Basically as the same API as you proposed.
@reyoung Thanks for your reminder, I paste a snippet here:
with fluid.unique_name.guard():
train_file_obj = fluid.layers.open_files(
filenames=TRAIN_FILES,
pass_num=pass_num,
shapes=[[-1, 1], [-1, 1]],
lod_levels=[1, 0],
dtypes=['int64', 'int64'],
thread_num=1)
I think it may make the API more friendly to extend current unique_name.guard
to support:
# add prefix to make debug easier
with fluid.unique_name.guard('prefix_1') as scope_1:
fc = fluid.layers.fc(...)
with fluid.unique_name.guard('prefix_2') as scope_2:
fc = fluid.layers.fc(input=scope_1.fc) # very convenient to refer fc in scope_1
@wangkuiyi In my opinion, even in GANs, multiple nets have a certain running order. So maybe we can allow the trainer takes more than one net configs(in the form of a list), generates multiple sets of programs, and use a for-loop inside the trainer to execute them in turn.
This idea is similar to @helinwang 's proposal. However, @helinwang proposes to compile all nets into a single program. I tend to assign every net with an independent program.
The problem is that pserver does not have a loop over data generator. With your design, user code has to do different things depends on whether it is pserver or trainer.
@emailweixu thanks for pointing out, that is correct. Another possibility is we do it in fluid.Compile
: when running as pserver, fluid.Compile
will compile the pserver program, and run it immediately.
Still, it's somewhat not satisfactory because the users may have done something in the Python code before fluid.Compile
with the assumption that it is used for training, not for running the pserver. I think reusing fluid.train
for the entry point of running pserver operators arguably has the same issue.
The most clean way I think is to "extract" out the Fluid program definition code from the Python glue code. And run only the Fluid program definition code. According to this logic, one way would be doing:
# assuming train.py is in the same folder
paddle run_pserver --main train.train_conv_network
Internally paddle run_pserver
does something like:
import os
import paddle.fluid as fluid
import train
os.environ['TRAINING_ROLE'] = "PSERVER"
program = fluid.Compile(train.train_conv_network) # transpile happens inside
program.run()
All, we did some thinking about how inference can be done. Please review our proposal:
import paddle.fluid as paddle
import paddle.v2.dataset as dataset
def inference_network():
image = fluid.layers.data(name='image', shape=[1, 28, 28])
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
hidden = fluid.layers.simple_img_conv_pool(image,
num_filters=32, filter_size=3,
pool_size=3, pool_stride=1, act='relu')
hidden = fluid.layers.dropout(hidden, 0.1)
hidden = fluid.layers.batch_norm(hidden)
prediction = fluid.layers.fc(hidden, size=10, act='softmax')
return prediction
def train_network():
prediction = inference_network()
loss = fluid.layers.cross_entropy(prediction, label)
return loss
def main():
params = fluid.Params('./params')
# If params is not None it will be loaded to Trainer
trainer = fluid.Trainer(train_network, optimizer=fluid.optimizer.SGD(), params=params)
def event_handler(event):
if isinstance(event, fluid.EndIteration):
print event.metrics
elif isinstance(event, fluid.EndPass):
test_metrics = trainer.test(reader=dataset.mnist.test())
print test_metrics
# Train over 100 epochs
trainer.train(reader=dataset.mnist.train(), 100, event_handler=event_handler)
inferencer = fluid.Inferencer(inference_network, trainer.params)
prediction = inferencer.infer({ 'image': <IMAGE_DATA>})
if __name__ == '__main__':
main()
When we were trying to implement the Param class, we realized it was pretty ugly to implement with a share scope. Therefore we update the syntax to the following.
import paddle.fluid as paddle
import paddle.v2.dataset as dataset
def inference_program():
image = fluid.layers.data(name='image', shape=[1, 28, 28])
hidden = fluid.layers.simple_img_conv_pool(image,
num_filters=32, filter_size=3,
pool_size=3, pool_stride=1, act='relu')
hidden = fluid.layers.dropout(hidden, 0.1)
hidden = fluid.layers.batch_norm(hidden)
prediction = fluid.layers.fc(hidden, size=10, act='softmax')
return prediction
def train_program():
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
prediction = inference_program()
cost = fluid.layers.cross_entropy(prediction, label)
avg_cost = fluid.layers.mean(cost)
return avg_cost
def main():
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
trainer = fluid.Trainer(program_func=train_program,
optimizer=fluid.optimizer.SGD(),
param_path="image.model",
place=place)
def event_handler(event):
if isinstance(event, fluid.EndEpochEvent):
pass
elif isinstance(event, fluid.EndStepEvent):
test_metrics = trainer.test(reader=test_reader)
pass
trainer.train(num_epochs=1,
event_handler=event_handler,
reader=train_reader,
feed_order=['image', 'label'])
inferencer = fluid.Inferencer(inference_program, param_path="image.model", place=place)
prediction = inferencer.infer({'image': < IMAGE_DATA >})
if __name__ == '__main__':
main()
I also noticed there is another design. The change is to have the Trainer
to handle the infer_program
. Is this version later than the above one?
def main():
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
trainer = fluid.Trainer(program_func=train_program,
infer_func=inference_program,
optimizer=fluid.optimizer.SGD(),
param_path="image.model",
place=place)
def event_handler(event):
if isinstance(event, fluid.EndEpochEvent):
pass
elif isinstance(event, fluid.EndStepEvent):
test_metrics = trainer.test(reader=test_reader)
pass
trainer.train(num_epochs=1,
event_handler=event_handler,
reader=train_reader,
feed_order=['image', 'label'])
inferencer = fluid.Inferencer(param_path="image.model", place=place)
prediction = inferencer.infer({'image': < IMAGE_DATA >})
Latest Syntax
import paddle.fluid as paddle
import paddle.v2.dataset as dataset
def inference_program():
image = fluid.layers.data(name='image', shape=[1, 28, 28])
hidden = fluid.layers.simple_img_conv_pool(image,
num_filters=32, filter_size=3,
pool_size=3, pool_stride=1, act='relu')
hidden = fluid.layers.dropout(hidden, 0.1)
hidden = fluid.layers.batch_norm(hidden)
prediction = fluid.layers.fc(hidden, size=10, act='softmax')
return prediction
def train_program():
label = fluid.layers.data(name='label', shape=[1], dtype='int64')
prediction = inference_program()
cost = fluid.layers.cross_entropy(prediction, label)
avg_cost = fluid.layers.mean(cost)
return avg_cost
def main():
place = fluid.CUDAPlace(0) if use_cuda else fluid.CPUPlace()
trainer = fluid.Trainer(program_func=train_program,
optimizer=fluid.optimizer.SGD(),
param_path="image.model",
place=place)
def event_handler(event):
if isinstance(event, fluid.EndEpochEvent):
trainer.save_inference_model("image.model")
elif isinstance(event, fluid.EndStepEvent):
test_metrics = trainer.test(reader=test_reader)
pass
trainer.train(num_epochs=1,
event_handler=event_handler,
reader=train_reader,
feed_order=['image', 'label'])
inferencer = fluid.Inferencer(
infer_func=inference_program,
param_path="image.model", place=place)
prediction = inferencer.infer({'image': < IMAGE_DATA >})
if __name__ == '__main__':
main()
您好,此issue在近一个月内暂无更新,我们将于今天内关闭。若在关闭后您仍需跟进提问,可重新开启此问题,我们将在24小时内回复您。因关闭带来的不便我们深表歉意,请您谅解~感谢您对PaddlePaddle的支持! Hello, this issue has not been updated in the past month. We will close it today for the sake of other user‘s experience. If you still need to follow up on this question after closing, please feel free to reopen it. In that case, we will get back to you within 24 hours. We apologize for the inconvenience caused by the closure and thank you so much for your support of PaddlePaddle Group!
Our current implementation of Fluid is incomplete and exposed too many details. A consequence is that Fluid applications are lengthy and incomprehensive.
Let us target for a cleanup and simplification