PaddlePaddle / models

Officially maintained, supported by PaddlePaddle, including CV, NLP, Speech, Rec, TS, big models and so on.
Apache License 2.0
6.89k stars 2.91k forks source link

Face detection Validation #1514

Open abcdvzz opened 5 years ago

abcdvzz commented 5 years ago

When I ran the validation code, I encoutered this error. Pls help me . I raised lots of questions yesterday. Pls, or I'll be fired soon.

----------- Configuration Arguments ----------- confs_threshold: 0.15 data_dir: data/WIDER_val/images/ file_list: data/wider_face_split/wider_face_val_bbx_gt.txt image_path: infer: False model_dir: PyramidBox_WiderFace/ pred_dir: pred use_gpu: True use_pyramidbox: True

W1210

10:20:21.553225 13318 device_context.cc:203] Please NOTE: device: 0, CUDA Capability: 61, Driver Version: 9.2, Runtime Version: 9.0 W1210 10:20:21.553249 13318 device_context.cc:210] device: 0, cuDNN Version: 7.0. W1210 10:20:22.738585 13318 system_allocator.cc:122] Cannot malloc 217.012 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Current value is 5e-06 W1210 10:20:22.738677 13318 legacy_allocator.cc:161] Cannot allocate 217.011719MB in GPU 0, available 201.375000MB W1210 10:20:22.738684 13318 legacy_allocator.cc:164] total 12787122176 W1210 10:20:22.738692 13318 legacy_allocator.cc:165] GpuMinChunkSize 256.000000B W1210 10:20:22.738700 13318 legacy_allocator.cc:168] GpuMaxChunkSize 59.314453kB W1210 10:20:22.738708 13318 legacy_allocator.cc:171] GPU memory used: 902.250000kB Traceback (most recent call last): File "widerface_eval.py", line 317, in infer(args, config) File "widerface_eval.py", line 63, in infer [det2, det3] = multi_scale_test(image, max_shrink) File "widerface_eval.py", line 203, in multi_scale_test det_b = detect_face(image, bt) File "widerface_eval.py", line 121, in detect_face return_numpy=False) File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 472, in run self.executor.run(program.desc, scope, 0, True, True) RuntimeError: parallel_for failed: out of memory terminate called after throwing an instance of 'paddle::platform::EnforceNotMet' what(): cudaFree{Host} failed in GPUAllocator::Free.: an illegal memory access was encountered at [/paddle/paddle/fluid/memory/detail/system_allocator.cc:150] PaddlePaddle Call Stacks: 0 0x7fa26295ce86p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const, int) + 486 1 0x7fa2641fda0ap paddle::memory::detail::GPUAllocator::Free(void, unsigned long, unsigned long) + 266 2 0x7fa2641fb922p paddle::memory::detail::BuddyAllocator::Free(void) + 1122 3 0x7fa2641f78a5p paddle::memory::allocation::LegacyAllocator::Free(paddle::memory::allocation::Allocation) + 69 4 0x7fa262960949p std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 57 5 0x7fa262961cfdp paddle::framework::Variable::PlaceholderImpl::~PlaceholderImpl() + 61 6 0x7fa26419999dp paddle::framework::Scope::~Scope() + 141 7 0x7fa2641998a1p paddle::framework::Scope::DropKids() + 81 8 0x7fa26419992dp paddle::framework::Scope::~Scope() + 29 9 0x7fa26295a80ap

Aborted at 1544408422 (unix time) try "date -d @1544408422" if you are using GNU date PC: @ 0x0 (unknown) SIGABRT (@0x3e800003406) received by PID 13318 (TID 0x7fa2b30c2700) from PID 13318; stack trace: @ 0x7fa2b2cb9390 (unknown) @ 0x7fa2b2913428 gsignal @ 0x7fa2b291502a abort @ 0x7fa2a891884d __gnu_cxx::verbose_terminate_handler() @ 0x7fa2a89166b6 (unknown) @ 0x7fa2a89156a9 (unknown) @ 0x7fa2a8916005 gxx_personality_v0 @ 0x7fa2a8e37f83 (unknown) @ 0x7fa2a8e38487 _Unwind_Resume @ 0x7fa2641fbc75 paddle::memory::detail::BuddyAllocator::Free() @ 0x7fa2641f78a5 paddle::memory::allocation::LegacyAllocator::Free() @ 0x7fa262960949 std::_Sp_counted_base<>::_M_release() @ 0x7fa262961cfd paddle::framework::Variable::PlaceholderImpl<>::~PlaceholderImpl() @ 0x7fa26419999d paddle::framework::Scope::~Scope() @ 0x7fa2641998a1 paddle::framework::Scope::DropKids() @ 0x7fa26419992d paddle::framework::Scope::~Scope()

qingqing01 commented 5 years ago
W1210 10:20:22.738585 13318 system_allocator.cc:122] Cannot malloc 217.012 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Current value is 5e-06
W1210 10:20:22.738677 13318 legacy_allocator.cc:161] Cannot allocate 217.011719MB in GPU 0, available 201.375000MB
W1210 10:20:22.738684 13318 legacy_allocator.cc:164] total 12787122176
W1210 10:20:22.738692 13318 legacy_allocator.cc:165] GpuMinChunkSize 256.000000B
W1210 10:20:22.738700 13318 legacy_allocator.cc:168] GpuMaxChunkSize 59.314453kB
W1210 10:20:22.738708 13318 legacy_allocator.cc:171] GPU memory used: 902.250000kB

Please note the error log. This model needs many GPU memory. From the log, there is no enough GPU memory on your GPU card. Maybe you can paste nvidia-smi info. You also can try a small image for a testing.

abcdvzz commented 5 years ago
W1210 10:20:22.738585 13318 system_allocator.cc:122] Cannot malloc 217.012 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Current value is 5e-06
W1210 10:20:22.738677 13318 legacy_allocator.cc:161] Cannot allocate 217.011719MB in GPU 0, available 201.375000MB
W1210 10:20:22.738684 13318 legacy_allocator.cc:164] total 12787122176
W1210 10:20:22.738692 13318 legacy_allocator.cc:165] GpuMinChunkSize 256.000000B
W1210 10:20:22.738700 13318 legacy_allocator.cc:168] GpuMaxChunkSize 59.314453kB
W1210 10:20:22.738708 13318 legacy_allocator.cc:171] GPU memory used: 902.250000kB

Please note the error log. This model needs many GPU memory. From the log, there is no enough GPU memory on your GPU card. Maybe you can paste nvidia-smi info. You also can try a small image for a testing.

It says : Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Actually, no matter how I adjust the value, the same problem occurs. And I have 12G gpu (Taitan xp), it seems enough for it. And where can I change the image like u said: "You also can try a small image for a testing." I changed the " image_shape = [3, 1024, 1024] " in 302 line of widerface_eval.py. It doesn't work.

qingqing01 commented 5 years ago

Yeah, 12G gpu is enough for one testing. Please make sure there is no job runing before you testing.

I changed the " image_shape = [3, 1024, 1024] " in 302 line of widerface_eval.py. It doesn't work.

The image shape is depended on real input image, not this setting in the widerface_eval.py.

qingqing01 commented 5 years ago

@abcdvzz Is there any progress ?

abcdvzz commented 5 years ago

@abcdvzz Is there any progress ?

No, Could u pls tell me where I can resize the input image if It's not : image_shape = [3, 1024, 1024] " in 302 line of widerface_eval.py?

qingqing01 commented 5 years ago

widerface_eval.py use the orignal image to test. You need to resize image after line https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/face_detection/widerface_eval.py#L41 . Or use a small image.

JWei-D commented 5 years ago

I met the same question, could you pls tell me how to deal with it? I have 4 GPU of 1080ti, but still met this problem.

abcdvzz commented 5 years ago

I met the same question, could you pls tell me how to deal with it? I have 4 GPU of 1080ti, but still met this problem.

I cannot solve it. And I have to use another version...

YanYan0716 commented 5 years ago

any one solved the problem? I met the same problem,thanks a lot

LeLiu commented 5 years ago

I met the same problem

chengduoZH commented 5 years ago

@LeLiu You can try the following solution:

zhhsplendid commented 5 years ago

@LeLiu, another suggestion. Could you run “nvidia-smi” in your terminal to make sure the GPU device you are using has enough available space? That is, no other processes are using the GPU

LeLiu commented 5 years ago

@chengduoZH @zhhsplendid thank you very much for your reply. as you suggestion, i set FLAGS_fraction_of_gpu_memory_to_use = 0.0 and batch_size = 1,but it still not work. and the “nvidia-smi” shows that no processes were running on the GPU.

Sun Mar 31 23:54:25 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.12                 Driver Version: 390.12                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:65:00.0 Off |                  N/A |
| 23%   42C    P0    60W / 250W |      0MiB / 12195MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

my code was working well on the CPUs but crashed while using the GPU。the following is the error log.

W0401 00:48:14.767410 83126 device_context.cc:263] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 9.1, Runtime API Version: 9.0
W0401 00:48:14.767463 83126 device_context.cc:271] device: 0, cuDNN Version: 7.0.
W0401 00:48:15.208385 83126 batch_norm_op.cu:169] Only 1 element in normalization dimension, we skip the batch norm calculation, let y = x.
W0401 00:48:15.226229 83126 system_allocator.cc:122] Cannot malloc 2700 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Current value is 0
W0401 00:48:15.226348 83126 legacy_allocator.cc:191] Cannot allocate 2.636719GB in GPU 0, available 834.625000MB
W0401 00:48:15.226358 83126 legacy_allocator.cc:194] total 12788105216
W0401 00:48:15.226367 83126 legacy_allocator.cc:195] GpuMinChunkSize 256.000000B
W0401 00:48:15.226377 83126 legacy_allocator.cc:198] GpuMaxChunkSize 0.000000B
W0401 00:48:15.226387 83126 legacy_allocator.cc:201] GPU memory used: 0.000000B
W0401 00:48:15.227049 83126 system_allocator.cc:122] Cannot malloc 2700 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Current value is 0
W0401 00:48:15.227165 83126 legacy_allocator.cc:191] Cannot allocate 2.636719GB in GPU 0, available 834.625000MB
W0401 00:48:15.227174 83126 legacy_allocator.cc:194] total 12788105216
W0401 00:48:15.227195 83126 legacy_allocator.cc:195] GpuMinChunkSize 256.000000B
W0401 00:48:15.227207 83126 legacy_allocator.cc:198] GpuMaxChunkSize 0.000000B
W0401 00:48:15.227221 83126 legacy_allocator.cc:201] GPU memory used: 0.000000B
W0401 00:48:15.227836 83126 system_allocator.cc:122] Cannot malloc 2700 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Current value is 0
W0401 00:48:15.227939 83126 legacy_allocator.cc:191] Cannot allocate 2.636719GB in GPU 0, available 834.625000MB
W0401 00:48:15.227948 83126 legacy_allocator.cc:194] total 12788105216
W0401 00:48:15.227957 83126 legacy_allocator.cc:195] GpuMinChunkSize 256.000000B
W0401 00:48:15.227967 83126 legacy_allocator.cc:198] GpuMaxChunkSize 0.000000B
W0401 00:48:15.227975 83126 legacy_allocator.cc:201] GPU memory used: 0.000000B
Traceback (most recent call last):
  File "train.py", line 110, in <module>
    main(sys.argv)
  File "train.py", line 107, in main
    train()
  File "train.py", line 82, in train
    train_loss, train_acc = exe.run(main_program, feed=feeder.feed(data), fetch_list=[loss, acc])
  File "/nfs/user/liule/opt/anaconda3/envs/paddle-gpu/lib/python3.6/site-packages/paddle/fluid/executor.py", line 525, in run
    use_program_cache=use_program_cache)
  File "/nfs/user/liule/opt/anaconda3/envs/paddle-gpu/lib/python3.6/site-packages/paddle/fluid/executor.py", line 591, in _run
    exe.run(program.desc, scope, 0, True, True)
paddle.fluid.core.EnforceNotMet: Invoke operator fetch error.
Python Callstacks:
  File "/nfs/user/liule/opt/anaconda3/envs/paddle-gpu/lib/python3.6/site-packages/paddle/fluid/framework.py", line 1317, in append_op
    attrs=kwargs.get("attrs", None))
  File "/nfs/user/liule/opt/anaconda3/envs/paddle-gpu/lib/python3.6/site-packages/paddle/fluid/executor.py", line 361, in _add_feed_fetch_ops
    attrs={'col': i})
  File "/nfs/user/liule/opt/anaconda3/envs/paddle-gpu/lib/python3.6/site-packages/paddle/fluid/executor.py", line 588, in _run
    fetch_var_name=fetch_var_name)
  File "/nfs/user/liule/opt/anaconda3/envs/paddle-gpu/lib/python3.6/site-packages/paddle/fluid/executor.py", line 525, in run
    use_program_cache=use_program_cache)
  File "train.py", line 82, in train
    train_loss, train_acc = exe.run(main_program, feed=feeder.feed(data), fetch_list=[loss, acc])
  File "train.py", line 107, in main
    train()
  File "train.py", line 110, in <module>                                                                                                                                                           [54/1201]
    main(sys.argv)
C++ Callstacks:
cudaMemcpy failed in paddle::platform::GpuMemcpySync (0x7f6c65e0cc40 -> 0x7f6974bff040, length: 4): an illegal memory access was encountered at [/paddle/paddle/fluid/platform/gpu_info.cc:234]
PaddlePaddle Call Stacks:
0       0x7f6c84b0c8d5p void paddle::platform::EnforceNotMet::Init<char const*>(char const*, char const*, int) + 357
1       0x7f6c84b0cc59p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 137
2       0x7f6c865d0bbcp paddle::platform::GpuMemcpySync(void*, void const*, unsigned long, cudaMemcpyKind) + 188
3       0x7f6c84c25dcbp void paddle::memory::Copy<paddle::platform::CPUPlace, paddle::platform::CUDAPlace>(paddle::platform::CPUPlace, void*, paddle::platform::CUDAPlace, void const*, unsigned long, CUstr
eam_st*) + 91
4       0x7f6c8657c90bp paddle::framework::TensorCopySync(paddle::framework::Tensor const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost
::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant:
:void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::det
ail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::Tensor*) + 827
5       0x7f6c860811d2p paddle::operators::FetchOp::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boos
t::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant
::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::de
tail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 626
6       0x7f6c86518575p paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boo
st::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::varian
t::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::d
etail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 341
7       0x7f6c84c2941ap paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 218
8       0x7f6c84c2b415p paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool) + 261
terminate called after throwing an instance of 'paddle::platform::EnforceNotMet'
  what():  cudaFree{Host} failed in GPUAllocator::Free.: an illegal memory access was encountered at [/paddle/paddle/fluid/memory/detail/system_allocator.cc:150]
PaddlePaddle Call Stacks:
0       0x7f6c84b0c8d5p void paddle::platform::EnforceNotMet::Init<char const*>(char const*, char const*, int) + 357
1       0x7f6c84b0cc59p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 137
2       0x7f6c865cd87bp paddle::memory::detail::GPUAllocator::Free(void*, unsigned long, unsigned long) + 187
3       0x7f6c865cb922p paddle::memory::detail::BuddyAllocator::Free(void*) + 1122
4       0x7f6c865c7247p void paddle::memory::legacy::Free<paddle::platform::CUDAPlace>(paddle::platform::CUDAPlace const&, void*, unsigned long) + 39
5       0x7f6c865c72bdp paddle::memory::allocation::LegacyAllocator::Free(paddle::memory::allocation::Allocation*) + 77
6       0x7f6c84b0f329p std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 57
7       0x7f6c84b10358p paddle::framework::Variable::PlaceholderImpl<paddle::framework::LoDTensor>::~PlaceholderImpl() + 56
8       0x7f6c8656c56dp paddle::framework::Scope::~Scope() + 157
9       0x7f6c8656c481p paddle::framework::Scope::DropKids() + 65
10      0x7f6c8656c4edp paddle::framework::Scope::~Scope() + 29
11      0x7f6c84c633d6p paddle::framework::ScopePool::DeleteScope(paddle::framework::Scope*) + 22
12      0x7f6c84c63431p paddle::framework::ScopePool::Clear() + 65

*** Aborted at 1554050895 (unix time) try "date -d @1554050895" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGABRT (@0x7d1000144b6) received by PID 83126 (TID 0x7f6cdc5c3740) from PID 83126; stack trace: ***
    @     0x7f6cd0937e37 _Unwind_Resume
    @     0x7f6c865cbc75 paddle::memory::detail::BuddyAllocator::Free()
    @     0x7f6c865c7247 paddle::memory::legacy::Free<>()
    @     0x7f6c865c72bd paddle::memory::allocation::LegacyAllocator::Free()
qingqing01 commented 5 years ago

@LeLiu

run commond:

export CUDA_VISIBLE_DEVICES=0
python -u train.py --batch_size=4 --pretrained_model=vgg_ilsvrc_16_fc_reduced --data_dir=/home/users/data/WIDERFACE/

dataset in /home/users/data/WIDERFACE/:

|-- wider_face_split
|   |-- readme.txt
|   |-- wider_face_test_filelist.txt
|   |-- wider_face_test.mat
|   |-- wider_face_train_bbx_gt.txt
|   |-- wider_face_train.mat
|   |-- wider_face_val_bbx_gt.txt
|   `-- wider_face_val.mat
|-- WIDER_test
|   `-- images
|-- WIDER_train
|   `-- images
`-- WIDER_val
    `-- images

log:

-----------  Configuration Arguments -----------
batch_num: None
batch_size: 4
data_dir: /home/users/dangqingqing/data/WIDERFACE/
enable_ce: False
epoc_num: 160
learning_rate: 0.001
mean_BGR: 104., 117., 123.
model_save_dir: output
num_devices: 1
parallel: True
pretrained_model: vgg_ilsvrc_16_fc_reduced
resize_h: 640
resize_w: 640
use_gpu: True
use_pyramidbox: True
with_mem_opt: True
------------------------------------------------
memory_optimize is deprecated. Use CompiledProgram and Executor
W0401 10:31:45.853430 111517 device_context.cc:261] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.2
W0401 10:31:45.856607 111517 device_context.cc:269] device: 0, cuDNN Version: 7.0.
import ujson error: No module named ujson use json
ParallelExecutor is deprecated. Please use CompiledProgram and Executor. CompiledProgram is a central place for optimization and Executor is the unified executor. Example can be found in compiler.py.
W0401 10:31:46.688274 111517 graph.h:204] WARN: After a series of passes, the current graph can be quite different from OriginProgram. So, please avoid using the `OriginProgram()` method!
Pass 0, batch 0, face loss 10.494228, head loss 9.767658, time 0.00272
Pass 0, batch 10, face loss 8.338327, head loss 10.441329, time 0.88548 
# ...
Pass 0, batch 3010, face loss 3.260000, head loss 2.577861, time 0.89221
Pass 0, batch 3020, face loss 2.235071, head loss 2.160218, time 0.91566

And although I set export FLAGS_fraction_of_gpu_memory_to_use=0, there is no problem in my machine.

LeLiu commented 5 years ago

@qingqing01 thank you very much.

I didn't run the WIDERFACE face detection model. I ran a code written by myself and use private data, just met the same problem of this issue. sorry I didn't make it clear.

With batch size 1,4,32,64,128, it all failed. Could it be some issues of GPU/CUDA configuration(but other programs use CUDA worked well)?

qingqing01 commented 5 years ago

@LeLiu Please note, the reader process use Python multi-process, if failed once, you must kill all the processes. You can try to run PyramidBox. Or if it's convenient, you can give me your code, I can try on my machine.

LeLiu commented 5 years ago

@LeLiu Please note, the reader process use Python multi-process, if failed once, you must kill all the processes. You can try to run PyramidBox. Or if it's convenient, you can give me your code, I can try on my machine.

thank you again.

I think I've sloved this problem. I read the log carefully and found that a lot of memory was being used(more than 10GB). After simplifying the network and using a smaller batch size (8), the error disappeared.

W0402 14:49:10.794189  8209 legacy_allocator.cc:191] Cannot allocate 227.777344MB in GPU 0, available 1.137329GB
W0402 14:49:10.794214  8209 legacy_allocator.cc:194] total 12788105216
W0402 14:49:10.794255  8209 legacy_allocator.cc:195] GpuMinChunkSize 256.000000B
W0402 14:49:10.794281  8209 legacy_allocator.cc:198] GpuMaxChunkSize 10.409210GB
W0402 14:49:10.794302  8209 legacy_allocator.cc:201] GPU memory used: 10.346843GB

But I'm still not sure if I have the same problem with @abcdvzz , because I do not understand the log very well.

phamkhactu commented 4 years ago

i have problem with train.py. I debug, it show error in line

    exe = fluid.Executor(place)
    exe.run(startup_prog)

and it show

-----------  Configuration Arguments -----------
batch_num: None
batch_size: 4
data_dir: data
enable_ce: False
epoc_num: 160
learning_rate: 0.001
mean_BGR: 104., 117., 123.
model_save_dir: output
num_devices: 1
parallel: True
pretrained_model: vgg_ilsvrc_16_fc_reduced
resize_h: 640
resize_w: 640
use_gpu: True
use_multiprocess: True
use_pyramidbox: True
------------------------------------------------
2020-02-28 08:59:28,675-WARNING: paddle.fluid.layers.py_reader() may be deprecated in the near future. Please use paddle.fluid.io.DataLoader.from_generator() instead.
/usr/local/lib/python3.6/dist-packages/paddle/fluid/executor.py:779: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
Traceback (most recent call last):
  File "train.py", line 284, in <module>
    train(args, config, train_parameters, train_file_list)
  File "train.py", line 157, in train
    exe.run(startup_prog)
  File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/executor.py", line 780, in run
    six.reraise(*sys.exc_info())
  File "/root/.local/lib/python3.6/site-packages/six.py", line 703, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/executor.py", line 775, in run
    use_program_cache=use_program_cache)
  File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/executor.py", line 822, in _run_impl
    use_program_cache=use_program_cache)
  File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/executor.py", line 899, in _run_program
    fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet: 

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2   paddle::platform::CUDADeviceContext::CUDADeviceContext(paddle::platform::CUDAPlace)
3   std::_Function_handler<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > (), std::reference_wrapper<std::_Bind_simple<paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<paddle::platform::Place, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::less<paddle::platform::Place>, std::allocator<std::pair<paddle::platform::Place const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > > > >*, paddle::platform::Place)::{lambda()#1} ()> > >::_M_invoke(std::_Any_data const&)
4   std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::__future_base::_Result_base::_Deleter>, std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > >::_M_invoke(std::_Any_data const&)
5   std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
6   std::__future_base::_Deferred_state<std::_Bind_simple<paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<paddle::platform::Place, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::less<paddle::platform::Place>, std::allocator<std::pair<paddle::platform::Place const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > > > >*, paddle::platform::Place)::{lambda()#1} ()>, std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >::_M_run_deferred()
7   paddle::platform::DeviceContextPool::Get(paddle::platform::Place const&)
8   paddle::framework::GarbageCollector::GarbageCollector(paddle::platform::Place const&, unsigned long)
9   paddle::framework::UnsafeFastGPUGarbageCollector::UnsafeFastGPUGarbageCollector(paddle::platform::CUDAPlace const&, unsigned long)
10  paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool)
11  paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool)

----------------------
Error Message Summary:
----------------------
Error: Paddle internal Check failed. (Please help us create a new issue, here we need to find the developer to add a user friendly error message): out of memory at (/paddle/paddle/fluid/platform/device_context.cc:220)

and my nvidia

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:01:00.0 Off |                  N/A |
| 45%   56C    P2    41W / 260W |  10933MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     23333      G   /usr/lib/xorg/Xorg                            66MiB |
|    0     23412      G   /usr/bin/sddm-greeter                         48MiB |
+-----------------------------------------------------------------------------+