aleksispi / drl-rpn-tf

Official Tensorflow implementation of drl-RPN: Deep Reinforcement Learning of Region Proposal Networks (CVPR 2018 paper)
MIT License
77 stars 21 forks source link

Out Of Memory problem when testing #4

Closed chang010453 closed 4 years ago

chang010453 commented 4 years ago

GPU: GTX1080Ti Cuda: 10.0 TensorFlow: 1.13

I follow the instruction here image

It work fine at first 3 images, but encounter OOM problem at 4th image.

Error:

Loaded.
voc_2007_test gt roidb loaded from /home/dennischang/drl-rpn-tf/data/cache/voc_2007_test_gt_roidb.pkl
2019-10-16 14:43:36.228181: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 0, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-10-16 14:43:36.228792: E tensorflow/core/grappler/optimizers/dependency_optimizer.cc:704] Iteration = 1, topological sort failed with message: The graph couldn't be sorted in topological order.
2019-10-16 14:43:36.239918: I tensorflow/stream_executor/dso_loader.cc:152] successfully opened CUDA library libcublas.so.10.0 locally
Mean #fix/img (tot, MA):    (2.000000, 0.001000)
Mean exploration (tot, MA): (0.136452, 0.000068)

im_detect: 1/4952 1.739s 0.011s
Mean #fix/img (tot, MA):    (3.000000, 0.003000)
Mean exploration (tot, MA): (0.183358, 0.000183)

im_detect: 2/4952 1.312s 0.028s
Mean #fix/img (tot, MA):    (3.666667, 0.005498)
Mean exploration (tot, MA): (0.199782, 0.000300)

im_detect: 3/4952 1.144s 0.035s
Mean #fix/img (tot, MA):    (5.000000, 0.009995)
Mean exploration (tot, MA): (0.217026, 0.000434)

im_detect: 4/4952 1.081s 0.034s
2019-10-16 14:43:49.712593: W tensorflow/core/common_runtime/bfc_allocator.cc:267] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.62GiB.  Current allocation summary follows.
2019-10-16 14:43:49.712700: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (256):   Total Chunks: 19, Chunks in use: 19. 4.8KiB allocated for chunks. 4.8KiB in use in bin. 1.1KiB client-requested in use in bin.
2019-10-16 14:43:49.712747: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (512):   Total Chunks: 4, Chunks in use: 4. 2.0KiB allocated for chunks. 2.0KiB in use in bin. 1.8KiB client-requested in use in bin.
2019-10-16 14:43:49.712780: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (1024):  Total Chunks: 8, Chunks in use: 7. 9.2KiB allocated for chunks. 8.0KiB in use in bin. 7.5KiB client-requested in use in bin.
2019-10-16 14:43:49.712811: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (2048):  Total Chunks: 8, Chunks in use: 8. 16.5KiB allocated for chunks. 16.5KiB in use in bin. 16.4KiB client-requested in use in bin.
2019-10-16 14:43:49.712843: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (4096):  Total Chunks: 1, Chunks in use: 1. 6.8KiB allocated for chunks. 6.8KiB in use in bin. 6.8KiB client-requested in use in bin.
2019-10-16 14:43:49.712872: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (8192):  Total Chunks: 3, Chunks in use: 2. 38.5KiB allocated for chunks. 25.0KiB in use in bin. 24.6KiB client-requested in use in bin.
2019-10-16 14:43:49.712900: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (16384):     Total Chunks: 4, Chunks in use: 4. 64.0KiB allocated for chunks. 64.0KiB in use in bin. 64.0KiB client-requested in use in bin.
2019-10-16 14:43:49.712926: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (32768):     Total Chunks: 1, Chunks in use: 1. 36.0KiB allocated for chunks. 36.0KiB in use in bin. 36.0KiB client-requested in use in bin.
2019-10-16 14:43:49.712956: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (65536):     Total Chunks: 1, Chunks in use: 1. 72.0KiB allocated for chunks. 72.0KiB in use in bin. 72.0KiB client-requested in use in bin.
2019-10-16 14:43:49.712987: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (131072):    Total Chunks: 1, Chunks in use: 1. 144.0KiB allocated for chunks. 144.0KiB in use in bin. 144.0KiB client-requested in use in bin.
2019-10-16 14:43:49.713019: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (262144):    Total Chunks: 2, Chunks in use: 2. 672.0KiB allocated for chunks. 672.0KiB in use in bin. 672.0KiB client-requested in use in bin.
2019-10-16 14:43:49.713103: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (524288):    Total Chunks: 4, Chunks in use: 4. 2.52MiB allocated for chunks. 2.52MiB in use in bin. 2.29MiB client-requested in use in bin.
2019-10-16 14:43:49.713147: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (1048576):   Total Chunks: 5, Chunks in use: 4. 6.34MiB allocated for chunks. 5.03MiB in use in bin. 4.48MiB client-requested in use in bin.
2019-10-16 14:43:49.713178: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (2097152):   Total Chunks: 5, Chunks in use: 5. 13.77MiB allocated for chunks. 13.77MiB in use in bin. 13.77MiB client-requested in use in bin.
2019-10-16 14:43:49.713209: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (4194304):   Total Chunks: 3, Chunks in use: 3. 17.10MiB allocated for chunks. 17.10MiB in use in bin. 12.94MiB client-requested in use in bin.
2019-10-16 14:43:49.713239: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (8388608):   Total Chunks: 7, Chunks in use: 7. 67.00MiB allocated for chunks. 67.00MiB in use in bin. 58.22MiB client-requested in use in bin.
2019-10-16 14:43:49.713267: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (16777216):  Total Chunks: 1, Chunks in use: 0. 23.23MiB allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-16 14:43:49.713298: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (33554432):  Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-16 14:43:49.713329: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (67108864):  Total Chunks: 2, Chunks in use: 2. 128.00MiB allocated for chunks. 128.00MiB in use in bin. 128.00MiB client-requested in use in bin.
2019-10-16 14:43:49.713356: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (134217728):     Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2019-10-16 14:43:49.713384: I tensorflow/core/common_runtime/bfc_allocator.cc:597] Bin (268435456):     Total Chunks: 6, Chunks in use: 3. 8.00GiB allocated for chunks. 3.62GiB in use in bin. 3.38GiB client-requested in use in bin.
2019-10-16 14:43:49.713413: I tensorflow/core/common_runtime/bfc_allocator.cc:613] Bin for 2.62GiB was 256.00MiB, Chunk State: 
2019-10-16 14:43:49.713442: I tensorflow/core/common_runtime/bfc_allocator.cc:619]   Size: 1.00GiB | Requested Size: 72.4KiB | in_use: 0
2019-10-16 14:43:49.713473: I tensorflow/core/common_runtime/bfc_allocator.cc:619]   Size: 1.38GiB | Requested Size: 27.4KiB | in_use: 0, prev:   Size: 2.62GiB | Requested Size: 2.62GiB | in_use: 1
2019-10-16 14:43:49.713500: I tensorflow/core/common_runtime/bfc_allocator.cc:619]   Size: 2.00GiB | Requested Size: 1.61GiB | in_use: 0
2019-10-16 14:43:49.713527: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fab62000000 of size 2810658816
2019-10-16 14:43:49.713549: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free  at 0x7fac09874000 of size 1484308480
2019-10-16 14:43:49.713571: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free  at 0x7fac68000000 of size 2147483648
2019-10-16 14:43:49.713592: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free  at 0x7fad10000000 of size 1073741824
2019-10-16 14:43:49.713624: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fad96000000 of size 536870912
2019-10-16 14:43:49.713658: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fadb6000000 of size 536870912
2019-10-16 14:43:49.713682: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fadd6000000 of size 67108864
2019-10-16 14:43:49.713703: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fadda000000 of size 67108864
2019-10-16 14:43:49.713722: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fadde000000 of size 9437184
2019-10-16 14:43:49.713744: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fadde900000 of size 9437184
2019-10-16 14:43:49.713765: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faddf200000 of size 2359296
2019-10-16 14:43:49.713788: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faddf440000 of size 3240192
2019-10-16 14:43:49.713810: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faddf757100 of size 1179648
2019-10-16 14:43:49.713831: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faddf877100 of size 9437184
2019-10-16 14:43:49.713853: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0177100 of size 1382400
2019-10-16 14:43:49.713874: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade02c8900 of size 256
2019-10-16 14:43:49.713896: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade02c8a00 of size 1024
2019-10-16 14:43:49.713918: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade02c8e00 of size 512
2019-10-16 14:43:49.713939: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade02c9000 of size 2359296
2019-10-16 14:43:49.713962: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0509000 of size 147456
2019-10-16 14:43:49.713984: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade052d000 of size 256
2019-10-16 14:43:49.714005: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade052d100 of size 6912
2019-10-16 14:43:49.714026: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade052ec00 of size 256
2019-10-16 14:43:49.714048: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade052ed00 of size 344064
2019-10-16 14:43:49.714069: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0582d00 of size 256
2019-10-16 14:43:49.714090: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0582e00 of size 1024
2019-10-16 14:43:49.714111: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0583200 of size 1024
2019-10-16 14:43:49.714134: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0583600 of size 1280
2019-10-16 14:43:49.714156: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0583b00 of size 589824
2019-10-16 14:43:49.714179: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0613b00 of size 16128
2019-10-16 14:43:49.714202: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0617a00 of size 2560
2019-10-16 14:43:49.714223: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free  at 0x7fade0618400 of size 1376256
2019-10-16 14:43:49.714244: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0768400 of size 512
2019-10-16 14:43:49.714264: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0768600 of size 512
2019-10-16 14:43:49.714285: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0768800 of size 256
2019-10-16 14:43:49.714306: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0768900 of size 256
2019-10-16 14:43:49.714326: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0768a00 of size 256
2019-10-16 14:43:49.714347: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0768b00 of size 256
2019-10-16 14:43:49.714367: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0768c00 of size 256
2019-10-16 14:43:49.714388: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0768d00 of size 256
2019-10-16 14:43:49.714408: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0768e00 of size 256
2019-10-16 14:43:49.714428: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0768f00 of size 256
2019-10-16 14:43:49.714449: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0769000 of size 256
2019-10-16 14:43:49.714469: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free  at 0x7fade0769100 of size 1280
2019-10-16 14:43:49.714490: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0769600 of size 256
2019-10-16 14:43:49.714510: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free  at 0x7fade0769700 of size 13824
2019-10-16 14:43:49.714532: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade076cd00 of size 16384
2019-10-16 14:43:49.714554: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0770d00 of size 256
2019-10-16 14:43:49.714576: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade0770e00 of size 1376256
2019-10-16 14:43:49.714597: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade08c0e00 of size 1280
2019-10-16 14:43:49.714618: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade08c1300 of size 16384
2019-10-16 14:43:49.714639: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Free  at 0x7fade08c5300 of size 24358144
2019-10-16 14:43:49.714660: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade2000000 of size 9437184
2019-10-16 14:43:49.714681: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade2900000 of size 9437184
2019-10-16 14:43:49.714703: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fade3200000 of size 14680064
2019-10-16 14:43:49.714724: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fae6a000000 of size 3240192
2019-10-16 14:43:49.714746: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fae6a317100 of size 4423680
2019-10-16 14:43:49.714768: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fae6a74f100 of size 758272
2019-10-16 14:43:49.714789: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7fae6a808300 of size 8355072
2019-10-16 14:43:49.714811: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece600000 of size 1280
2019-10-16 14:43:49.714832: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece600500 of size 344064
2019-10-16 14:43:49.714852: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece654500 of size 256
2019-10-16 14:43:49.714873: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece654600 of size 256
2019-10-16 14:43:49.714894: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece654700 of size 1280
2019-10-16 14:43:49.714914: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece654c00 of size 256
2019-10-16 14:43:49.714937: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece654d00 of size 73728
2019-10-16 14:43:49.714958: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece666d00 of size 16384
2019-10-16 14:43:49.714979: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece66ad00 of size 2048
2019-10-16 14:43:49.715001: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece66b500 of size 9472
2019-10-16 14:43:49.715022: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece66da00 of size 2048
2019-10-16 14:43:49.715042: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece66e200 of size 2048
2019-10-16 14:43:49.715063: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece66ea00 of size 2048
2019-10-16 14:43:49.715084: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece66f200 of size 36864
2019-10-16 14:43:49.715105: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece678200 of size 2048
2019-10-16 14:43:49.715125: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece678a00 of size 256
2019-10-16 14:43:49.715147: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece678b00 of size 16384
2019-10-16 14:43:49.715167: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece67cb00 of size 2048
2019-10-16 14:43:49.715188: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece67d300 of size 2048
2019-10-16 14:43:49.715209: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece67db00 of size 512
2019-10-16 14:43:49.715231: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faece67dd00 of size 533248
2019-10-16 14:43:49.715252: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faecea00000 of size 758272
2019-10-16 14:43:49.715274: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faeceab9200 of size 1338880
2019-10-16 14:43:49.715296: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faecec00000 of size 8388608
2019-10-16 14:43:49.715318: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faecf400000 of size 3240192
2019-10-16 14:43:49.715339: I tensorflow/core/common_runtime/bfc_allocator.cc:632] Chunk at 0x7faecf717100 of size 5148416
2019-10-16 14:43:49.715359: I tensorflow/core/common_runtime/bfc_allocator.cc:638]      Summary of in-use Chunks by size: 
2019-10-16 14:43:49.715385: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 19 Chunks of size 256 totalling 4.8KiB
2019-10-16 14:43:49.715408: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 4 Chunks of size 512 totalling 2.0KiB
2019-10-16 14:43:49.715431: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 1024 totalling 3.0KiB
2019-10-16 14:43:49.715453: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 4 Chunks of size 1280 totalling 5.0KiB
2019-10-16 14:43:49.715476: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 7 Chunks of size 2048 totalling 14.0KiB
2019-10-16 14:43:49.715499: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 2560 totalling 2.5KiB
2019-10-16 14:43:49.715521: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 6912 totalling 6.8KiB
2019-10-16 14:43:49.715544: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 9472 totalling 9.2KiB
2019-10-16 14:43:49.715568: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 16128 totalling 15.8KiB
2019-10-16 14:43:49.715592: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 4 Chunks of size 16384 totalling 64.0KiB
2019-10-16 14:43:49.715616: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 36864 totalling 36.0KiB
2019-10-16 14:43:49.715639: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 73728 totalling 72.0KiB
2019-10-16 14:43:49.715663: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 147456 totalling 144.0KiB
2019-10-16 14:43:49.715687: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 2 Chunks of size 344064 totalling 672.0KiB
2019-10-16 14:43:49.715710: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 533248 totalling 520.8KiB
2019-10-16 14:43:49.715733: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 589824 totalling 576.0KiB
2019-10-16 14:43:49.715756: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 2 Chunks of size 758272 totalling 1.45MiB
2019-10-16 14:43:49.715779: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 1179648 totalling 1.12MiB
2019-10-16 14:43:49.715801: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 1338880 totalling 1.28MiB
2019-10-16 14:43:49.715823: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 1376256 totalling 1.31MiB
2019-10-16 14:43:49.715845: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 1382400 totalling 1.32MiB
2019-10-16 14:43:49.715868: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 2 Chunks of size 2359296 totalling 4.50MiB
2019-10-16 14:43:49.715891: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 3 Chunks of size 3240192 totalling 9.27MiB
2019-10-16 14:43:49.715913: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 4423680 totalling 4.22MiB
2019-10-16 14:43:49.715935: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 5148416 totalling 4.91MiB
2019-10-16 14:43:49.715958: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 8355072 totalling 7.97MiB
2019-10-16 14:43:49.715980: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 8388608 totalling 8.00MiB
2019-10-16 14:43:49.716003: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 5 Chunks of size 9437184 totalling 45.00MiB
2019-10-16 14:43:49.716026: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 14680064 totalling 14.00MiB
2019-10-16 14:43:49.716050: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 2 Chunks of size 67108864 totalling 128.00MiB
2019-10-16 14:43:49.716072: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 2 Chunks of size 536870912 totalling 1.00GiB
2019-10-16 14:43:49.716095: I tensorflow/core/common_runtime/bfc_allocator.cc:641] 1 Chunks of size 2810658816 totalling 2.62GiB
2019-10-16 14:43:49.716117: I tensorflow/core/common_runtime/bfc_allocator.cc:645] Sum Total of in-use chunks: 3.85GiB
2019-10-16 14:43:49.716146: I tensorflow/core/common_runtime/bfc_allocator.cc:647] Stats: 
Limit:                 10149868340
InUse:                  4130232320
MaxInUse:               5733378816
NumAllocs:                    3937
MaxAllocSize:           4294967296

2019-10-16 14:43:49.716202: W tensorflow/core/common_runtime/bfc_allocator.cc:271] ********************************____________________________________________________****************
2019-10-16 14:43:49.716268: W tensorflow/core/framework/op_kernel.cc:1401] OP_REQUIRES failed at transpose_op.cc:199 : Resource exhausted: OOM when allocating tensor with shape[7002,512,14,14] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "/home/dennischang/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1334, in _do_call
    return fn(*args)
  File "/home/dennischang/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
    options, feed_dict, fetch_list, target_list, run_metadata)
  File "/home/dennischang/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
    run_metadata)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[7002,512,14,14] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[{{node MaxPool2D_1/MaxPool-0-TransposeNHWCToNCHW-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "./tools/test_net.py", line 153, in <module>
    test_net(sess, net, imdb, filename, max_per_image=args.max_per_image)
  File "/home/dennischang/drl-rpn-tf/tools/../lib/model/test.py", line 106, in test_net
    im_idx, nbr_gts)
  File "/home/dennischang/drl-rpn-tf/tools/../lib/model/test.py", line 42, in im_detect
    im_idx, nbr_gts)
  File "/home/dennischang/drl-rpn-tf/tools/../lib/model/factory.py", line 567, in run_drl_rpn
    roi_objnesses)
  File "/home/dennischang/drl-rpn-tf/tools/../lib/model/factory.py", line 603, in _collect_detections
    scores = net.post_hist_nudge(sess, net_conv, rois, cls_hist)
  File "/home/dennischang/drl-rpn-tf/tools/../lib/nets/network.py", line 1064, in post_hist_nudge
    feed_dict=feed_dict_seq)
  File "/home/dennischang/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/dennischang/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/dennischang/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/dennischang/anaconda3/envs/tf/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: OOM when allocating tensor with shape[7002,512,14,14] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
     [[{{node MaxPool2D_1/MaxPool-0-TransposeNHWCToNCHW-LayoutOptimizer}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

Command exited with non-zero status 1
11.31user 2.21system 0:18.43elapsed 73%CPU (0avgtext+0avgdata 4385928maxresident)k
0inputs+0outputs (0major+980653minor)pagefaults 0swaps

How to fix it?

aleksispi commented 4 years ago

Hi,

Are you using the model with posterior class probability adjustments or not? I think using the weights with adjustments requires more memory than without, so you could perhaps try without it.

I havent tested with CUDA10 and/or v 1.13 of TF (I used r1.2).

chang010453 commented 4 years ago

Thanks. After turning off posterior class probability adjustments, it works fine.