PaddlePaddle / Paddle

PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
http://www.paddlepaddle.org/
Apache License 2.0
22.24k stars 5.59k forks source link

Simplify the definition of data sources #1264

Closed wangkuiyi closed 7 years ago

wangkuiyi commented 7 years ago

It seems that the >200 lines in data_sources.py are too lengthy -- there defined three functions define_py_data_source, define_py_data_sources, and define_py_data_sources2, but only the last one is really used. It seems reasonable to remove the former two.

wangkuiyi commented 7 years ago

The following script ran and its result verify that only define_py_data_sources2, but not the former two, is really used:

$ for i in $(du -a | grep '\.py$' | cut -f 2); do if [[ -f $i && $i != *"build/"* ]]; then if grep define_py_data_source $i; then echo ==== $i; fi; fi; done  | tee /tmp/l
define_py_data_sources2(
==== ./paddle/trainer/tests/simple_sparse_neural_network.py
define_py_data_sources2(
==== ./paddle/gserver/tests/sequence_rnn_multi_unequalength_inputs.py
define_py_data_sources2(
==== ./paddle/gserver/tests/sequence_nest_rnn_multi_unequalength_inputs.py
    define_py_data_sources2(
==== ./demo/image_classification/vgg_16_cifar.py
    define_py_data_sources2(
==== ./demo/semantic_role_labeling/db_lstm.py
    define_py_data_sources2(
==== ./demo/mnist/vgg_16_mnist.py
define_py_data_sources2(
==== ./demo/traffic_prediction/trainer_config.py
define_py_data_sources2(
==== ./demo/quick_start/trainer_config.resnet-lstm.py
define_py_data_sources2(
==== ./demo/quick_start/trainer_config.emb.py
define_py_data_sources2(
==== ./demo/quick_start/trainer_config.cnn.py
# to define_py_data_sources2(). See trainer_config.lr.py.
==== ./demo/quick_start/dataprovider_bow.py
define_py_data_sources2(
==== ./demo/quick_start/trainer_config.lstm.py
define_py_data_sources2(
==== ./demo/quick_start/trainer_config.db-lstm.py
define_py_data_sources2(
==== ./demo/quick_start/trainer_config.lr.py
define_py_data_sources2(
==== ./demo/quick_start/trainer_config.bidi-lstm.py
    define_py_data_sources2(
==== ./demo/sentiment/sentiment_net.py
    define_py_data_sources2(
==== ./demo/recommendation/trainer_config.py
define_py_data_sources2(
==== ./demo/sequence_tagging/rnn_crf.py
define_py_data_sources2(
==== ./demo/sequence_tagging/linear_crf.py
define_py_data_sources2(
==== ./demo/introduction/trainer_config.py
    define_py_data_sources2(
==== ./demo/seqToseq/seqToseq_net.py
    define_py_data_sources2(
==== ./demo/model_zoo/resnet/resnet.py
define_py_data_sources2(
==== ./doc/api/data_provider/src/mnist_config.py
define_py_data_sources2(
==== ./doc/api/data_provider/src/sentimental_config.py
define_py_data_sources2(
==== ./doc/howto/usage/concepts/src/trainer_config.py
define_py_data_sources2(
==== ./benchmark/paddle/image/smallnet_mnist_cifar.py
define_py_data_sources2(
==== ./benchmark/paddle/image/googlenet.py
define_py_data_sources2(
==== ./benchmark/paddle/image/alexnet.py
define_py_data_sources2(
==== ./benchmark/paddle/rnn/rnn.py
__all__ = ['define_py_data_sources2']
def define_py_data_source(file_list,
        define_py_data_source("train.list", TrainData, "data_provider", "process")
        define_py_data_source("train.list", TrainData, "data_provider", "process",
def define_py_data_sources(train_list,
    The annotation is almost the same as define_py_data_sources2, except that
        define_py_data_source(train_list, TrainData, train_module, train_obj,
        define_py_data_source(test_list, TestData, test_module, test_obj,
def define_py_data_sources2(train_list, test_list, module, obj, args=None):
        define_py_data_sources2(train_list="train.list",
    define_py_data_sources(
==== ./python/paddle/trainer_config_helpers/data_sources.py
define_py_data_sources2(
==== ./python/paddle/trainer_config_helpers/tests/configs/test_split_datasource.py
    define_py_data_sources2(
==== ./python/paddle/utils/predefined_net.py