courins commented 5 years ago

too slow training speed

my current env is win7 x64 System Nvidia Geforce GTX 1080 (8G) CUDA9.0 cuDNN7.0.5 tensorflow-gpu(1.6.0) tensorpack (0.8.0) gym now use(0.12.1)

and i used examples data for training \tensorpack-medical\examples\LandmarkDetection\DQN\data\filenames\image_files \tensorpack-medical\examples\LandmarkDetection\DQN\data\filenames\landmark_files

for gpu memory limit, and i used parameters: BATCH_SIZE = 24

and GPU and CPU setting: mem_fraction = 0.8

conf = tf.ConfigProto(log_device_placement=True)

conf = tf.ConfigProto()
# conf.allow_soft_placement = True
conf.intra_op_parallelism_threads = 6
conf.inter_op_parallelism_threads = 6
conf.gpu_options.per_process_gpu_memory_fraction = mem_fraction
conf.gpu_options.allow_growth = True

and exclude Data Load's effect.i used FakeData

dataflow = FakeData([[BATCH_SIZE,45,45,45,5],[BATCH_SIZE],[BATCH_SIZE],[BATCH_SIZE]],size=1000,random=False, dtype=['uint8','float32','int8','bool'])

and minimal training setting:

return TrainConfig( data=QueueInput(dataflow), model=Model(), callbacks=[],

steps_per_epoch=10,

    steps_per_epoch=10,
    max_epoch=1000,
    session_config= conf,
)

the training speed is 28 seconds per iter.

even i reduce the model complexness (by commented Conv3D and Pool3D ):

with argscope(Conv3D, nl=PReLU.symbolic_function, use_bias=True):

core layers of the network

        conv = (LinearWrap(image)
             .Conv3D('conv0', out_channel=32,
                     kernel_shape=[5,5,5], stride=[1,1,1])
             .MaxPooling3D('pool0',16)
            #  .Conv3D('conv1', out_channel=32,
            #          kernel_shape=[5,5,5], stride=[1,1,1])
            #  .MaxPooling3D('pool1',2)
            #  .Conv3D('conv2', out_channel=64,
            #          kernel_shape=[4,4,4], stride=[1,1,1])
            #  .MaxPooling3D('pool2',2)
            #  .Conv3D('conv3', out_channel=64,
            #          kernel_shape=[3,3,3], stride=[1,1,1])
             )

the training speed is 22 seconds per iter.

it is 100x slow by comparison with your training speed {around ~3-4 it/sec using the default big architecture on a GTX 1080}

I want to know why and please give me some suggestions about reduce the training time.

courins commented 5 years ago

placement cpu

why placement Conv3D in CPU ? and i how to placement to GPU device?

courins commented 5 years ago

@layer_register(log_shape=True) @convert_to_tflayer_args( args_names=['filters', 'kernel_size'], name_mapping={ 'out_channel': 'filters', 'kernel_shape': 'kernel_size', 'stride': 'strides', }) def Conv3D()

And

@layer_register(log_shape=True)

def Conv3D()

have what differences?

i has errrors when using first Conv3D()

File "C:\Anaconda3\envs\tf1.6\lib\site-packages\tensorpack\tfutils\argscope.py", line 45, in argscope _check_args_exist(l.symbolic_function) File "C:\Anaconda3\envs\tf1.6\lib\site-packages\tensorpack\tfutils\argscope.py", line 41, in _check_args_exist assert k in args, "No argument {} in {}".format(k, l.name) AssertionError: No argument nl in Conv3D

@amiralansary @crypdick @thanosvlo @thanosvlo please help me!

courins commented 5 years ago

why target/conv0/Conv3D can placement at GPU ,but conv0/Conv3D cannot placement at GPU?

{ "cat": "Tensor", "id": 22, "ph": "O", "tid": 0, "name": "conv0/Conv3D", "args": { "snapshot": { "tensor_description": "dtype: DT_FLOAT\nshape {\n dim {\n size: 24\n }\n dim {\n size: 45\n }\n dim {\n size: 45\n }\n dim {\n size: 45\n }\n dim {\n size: 32\n }\n}\nallocation_description {\n requested_bytes: 279936000\n allocated_bytes: 279936000\n allocator_name: cpu\n allocation_id: 1\n has_single_reference: true\n ptr: 3082879072\n}\n" } }, "pid": 2, "ts": 1562635795004270 },

{ "cat": "Tensor", "id": 382, "ph": "O", "tid": 13, "name": "target/conv0/Conv3D", "args": { "snapshot": { "tensor_description": "dtype: DT_FLOAT\nshape {\n dim {\n size: 24\n }\n dim {\n size: 45\n }\n dim {\n size: 45\n }\n dim {\n size: 45\n }\n dim {\n size: 32\n }\n}\nallocation_description {\n requested_bytes: 279936000\n allocated_bytes: 279936000\n allocator_name: GPU_0_bfc\n allocation_id: 430\n has_single_reference: true\n ptr: 116364673024\n}\n" } }, "pid": 4, "ts": 1562635790607019 },

courins commented 5 years ago

please see https://github.com/tensorpack/tensorpack/issues/1259

amiralansary commented 5 years ago

@courins The slow down is obviously a result of running the code on the cpu and not the gpu. The reason of that is not clear to me, either your envs or tensorpack. It seems that you have solved this issue using a newer version of tensorpack here tensorpack/tensorpack#1259 , by upgrading to the current tensorpack master and TF 1.13, and modify the code accodring to this

I close this issue and refer to the upgrade open issue #9

amiralansary / rl-medical

too slow training speed #10

conf = tf.ConfigProto(log_device_placement=True)

steps_per_epoch=10,

core layers of the network

@layer_register(log_shape=True)