Closed courins closed 5 years ago
why placement Conv3D in CPU ? and i how to placement to GPU device?
@layer_register(log_shape=True) @convert_to_tflayer_args( args_names=['filters', 'kernel_size'], name_mapping={ 'out_channel': 'filters', 'kernel_shape': 'kernel_size', 'stride': 'strides', }) def Conv3D()
And
def Conv3D()
have what differences?
i has errrors when using first Conv3D()
File "C:\Anaconda3\envs\tf1.6\lib\site-packages\tensorpack\tfutils\argscope.py", line 45, in argscope _check_args_exist(l.symbolic_function) File "C:\Anaconda3\envs\tf1.6\lib\site-packages\tensorpack\tfutils\argscope.py", line 41, in _check_args_exist assert k in args, "No argument {} in {}".format(k, l.name) AssertionError: No argument nl in Conv3D
@amiralansary @crypdick @thanosvlo @thanosvlo please help me!
why target/conv0/Conv3D can placement at GPU ,but conv0/Conv3D cannot placement at GPU?
{ "cat": "Tensor", "id": 22, "ph": "O", "tid": 0, "name": "conv0/Conv3D", "args": { "snapshot": { "tensor_description": "dtype: DT_FLOAT\nshape {\n dim {\n size: 24\n }\n dim {\n size: 45\n }\n dim {\n size: 45\n }\n dim {\n size: 45\n }\n dim {\n size: 32\n }\n}\nallocation_description {\n requested_bytes: 279936000\n allocated_bytes: 279936000\n allocator_name: cpu\n allocation_id: 1\n has_single_reference: true\n ptr: 3082879072\n}\n" } }, "pid": 2, "ts": 1562635795004270 },
{ "cat": "Tensor", "id": 382, "ph": "O", "tid": 13, "name": "target/conv0/Conv3D", "args": { "snapshot": { "tensor_description": "dtype: DT_FLOAT\nshape {\n dim {\n size: 24\n }\n dim {\n size: 45\n }\n dim {\n size: 45\n }\n dim {\n size: 45\n }\n dim {\n size: 32\n }\n}\nallocation_description {\n requested_bytes: 279936000\n allocated_bytes: 279936000\n allocator_name: GPU_0_bfc\n allocation_id: 430\n has_single_reference: true\n ptr: 116364673024\n}\n" } }, "pid": 4, "ts": 1562635790607019 },
@courins The slow down is obviously a result of running the code on the cpu and not the gpu. The reason of that is not clear to me, either your envs or tensorpack. It seems that you have solved this issue using a newer version of tensorpack here tensorpack/tensorpack#1259 , by upgrading to the current tensorpack master and TF 1.13, and modify the code accodring to this
I close this issue and refer to the upgrade open issue #9
too slow training speed
my current env is win7 x64 System Nvidia Geforce GTX 1080 (8G) CUDA9.0 cuDNN7.0.5 tensorflow-gpu(1.6.0) tensorpack (0.8.0) gym now use(0.12.1)
and i used examples data for training \tensorpack-medical\examples\LandmarkDetection\DQN\data\filenames\image_files \tensorpack-medical\examples\LandmarkDetection\DQN\data\filenames\landmark_files
for gpu memory limit, and i used parameters: BATCH_SIZE = 24
and GPU and CPU setting: mem_fraction = 0.8
conf = tf.ConfigProto(log_device_placement=True)
and exclude Data Load's effect.i used FakeData
dataflow = FakeData([[BATCH_SIZE,45,45,45,5],[BATCH_SIZE],[BATCH_SIZE],[BATCH_SIZE]],size=1000,random=False, dtype=['uint8','float32','int8','bool'])
and minimal training setting:
return TrainConfig( data=QueueInput(dataflow), model=Model(), callbacks=[],
steps_per_epoch=10,
the training speed is 28 seconds per iter.
even i reduce the model complexness (by commented Conv3D and Pool3D ):
with argscope(Conv3D, nl=PReLU.symbolic_function, use_bias=True):
core layers of the network
the training speed is 22 seconds per iter.
it is 100x slow by comparison with your training speed {around ~3-4 it/sec using the default big architecture on a GTX 1080}
I want to know why and please give me some suggestions about reduce the training time.