Face detection Validation

abcdvzz commented 5 years ago

When I ran the validation code, I encoutered this error. Pls help me . I raised lots of questions yesterday. Pls, or I'll be fired soon.

----------- Configuration Arguments ----------- confs_threshold: 0.15 data_dir: data/WIDER_val/images/ file_list: data/wider_face_split/wider_face_val_bbx_gt.txt image_path: infer: False model_dir: PyramidBox_WiderFace/ pred_dir: pred use_gpu: True use_pyramidbox: True

W1210

10:20:21.553225 13318 device_context.cc:203] Please NOTE: device: 0, CUDA Capability: 61, Driver Version: 9.2, Runtime Version: 9.0 W1210 10:20:21.553249 13318 device_context.cc:210] device: 0, cuDNN Version: 7.0. W1210 10:20:22.738585 13318 system_allocator.cc:122] Cannot malloc 217.012 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Current value is 5e-06 W1210 10:20:22.738677 13318 legacy_allocator.cc:161] Cannot allocate 217.011719MB in GPU 0, available 201.375000MB W1210 10:20:22.738684 13318 legacy_allocator.cc:164] total 12787122176 W1210 10:20:22.738692 13318 legacy_allocator.cc:165] GpuMinChunkSize 256.000000B W1210 10:20:22.738700 13318 legacy_allocator.cc:168] GpuMaxChunkSize 59.314453kB W1210 10:20:22.738708 13318 legacy_allocator.cc:171] GPU memory used: 902.250000kB Traceback (most recent call last): File "widerface_eval.py", line 317, in infer(args, config) File "widerface_eval.py", line 63, in infer [det2, det3] = multi_scale_test(image, max_shrink) File "widerface_eval.py", line 203, in multi_scale_test det_b = detect_face(image, bt) File "widerface_eval.py", line 121, in detect_face return_numpy=False) File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 472, in run self.executor.run(program.desc, scope, 0, True, True) RuntimeError: parallel_for failed: out of memory terminate called after throwing an instance of 'paddle::platform::EnforceNotMet' what(): cudaFree{Host} failed in GPUAllocator::Free.: an illegal memory access was encountered at [/paddle/paddle/fluid/memory/detail/system_allocator.cc:150] PaddlePaddle Call Stacks: 0 0x7fa26295ce86p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const, int) + 486 1 0x7fa2641fda0ap paddle::memory::detail::GPUAllocator::Free(void, unsigned long, unsigned long) + 266 2 0x7fa2641fb922p paddle::memory::detail::BuddyAllocator::Free(void) + 1122 3 0x7fa2641f78a5p paddle::memory::allocation::LegacyAllocator::Free(paddle::memory::allocation::Allocation) + 69 4 0x7fa262960949p std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 57 5 0x7fa262961cfdp paddle::framework::Variable::PlaceholderImpl::~PlaceholderImpl() + 61 6 0x7fa26419999dp paddle::framework::Scope::~Scope() + 141 7 0x7fa2641998a1p paddle::framework::Scope::DropKids() + 81 8 0x7fa26419992dp paddle::framework::Scope::~Scope() + 29 9 0x7fa26295a80ap

Aborted at 1544408422 (unix time) try "date -d @1544408422" if you are using GNU date PC: @ 0x0 (unknown) SIGABRT (@0x3e800003406) received by PID 13318 (TID 0x7fa2b30c2700) from PID 13318; stack trace: @ 0x7fa2b2cb9390 (unknown) @ 0x7fa2b2913428 gsignal @ 0x7fa2b291502a abort @ 0x7fa2a891884d __gnu_cxx::verbose_terminate_handler() @ 0x7fa2a89166b6 (unknown) @ 0x7fa2a89156a9 (unknown) @ 0x7fa2a8916005 gxx_personality_v0 @ 0x7fa2a8e37f83 (unknown) @ 0x7fa2a8e38487 _Unwind_Resume @ 0x7fa2641fbc75 paddle::memory::detail::BuddyAllocator::Free() @ 0x7fa2641f78a5 paddle::memory::allocation::LegacyAllocator::Free() @ 0x7fa262960949 std::_Sp_counted_base<>::_M_release() @ 0x7fa262961cfd paddle::framework::Variable::PlaceholderImpl<>::~PlaceholderImpl() @ 0x7fa26419999d paddle::framework::Scope::~Scope() @ 0x7fa2641998a1 paddle::framework::Scope::DropKids() @ 0x7fa26419992d paddle::framework::Scope::~Scope()

qingqing01 commented 5 years ago

W1210 10:20:22.738585 13318 system_allocator.cc:122] Cannot malloc 217.012 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Current value is 5e-06
W1210 10:20:22.738677 13318 legacy_allocator.cc:161] Cannot allocate 217.011719MB in GPU 0, available 201.375000MB
W1210 10:20:22.738684 13318 legacy_allocator.cc:164] total 12787122176
W1210 10:20:22.738692 13318 legacy_allocator.cc:165] GpuMinChunkSize 256.000000B
W1210 10:20:22.738700 13318 legacy_allocator.cc:168] GpuMaxChunkSize 59.314453kB
W1210 10:20:22.738708 13318 legacy_allocator.cc:171] GPU memory used: 902.250000kB

Please note the error log. This model needs many GPU memory. From the log, there is no enough GPU memory on your GPU card. Maybe you can paste nvidia-smi info. You also can try a small image for a testing.

abcdvzz commented 5 years ago

W1210 10:20:22.738585 13318 system_allocator.cc:122] Cannot malloc 217.012 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Current value is 5e-06
W1210 10:20:22.738677 13318 legacy_allocator.cc:161] Cannot allocate 217.011719MB in GPU 0, available 201.375000MB
W1210 10:20:22.738684 13318 legacy_allocator.cc:164] total 12787122176
W1210 10:20:22.738692 13318 legacy_allocator.cc:165] GpuMinChunkSize 256.000000B
W1210 10:20:22.738700 13318 legacy_allocator.cc:168] GpuMaxChunkSize 59.314453kB
W1210 10:20:22.738708 13318 legacy_allocator.cc:171] GPU memory used: 902.250000kB

Please note the error log. This model needs many GPU memory. From the log, there is no enough GPU memory on your GPU card. Maybe you can paste nvidia-smi info. You also can try a small image for a testing.

It says : Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Actually, no matter how I adjust the value, the same problem occurs. And I have 12G gpu (Taitan xp), it seems enough for it. And where can I change the image like u said: "You also can try a small image for a testing." I changed the " image_shape = [3, 1024, 1024] " in 302 line of widerface_eval.py. It doesn't work.

qingqing01 commented 5 years ago

Yeah, 12G gpu is enough for one testing. Please make sure there is no job runing before you testing.

I changed the " image_shape = [3, 1024, 1024] " in 302 line of widerface_eval.py. It doesn't work.

The image shape is depended on real input image, not this setting in the widerface_eval.py.

qingqing01 commented 5 years ago

@abcdvzz Is there any progress ?

abcdvzz commented 5 years ago

@abcdvzz Is there any progress ?

No, Could u pls tell me where I can resize the input image if It's not : image_shape = [3, 1024, 1024] " in 302 line of widerface_eval.py?

qingqing01 commented 5 years ago

widerface_eval.py use the orignal image to test. You need to resize image after line https://github.com/PaddlePaddle/models/blob/develop/fluid/PaddleCV/face_detection/widerface_eval.py#L41 . Or use a small image.

JWei-D commented 5 years ago

I met the same question, could you pls tell me how to deal with it? I have 4 GPU of 1080ti, but still met this problem.

abcdvzz commented 5 years ago

I met the same question, could you pls tell me how to deal with it? I have 4 GPU of 1080ti, but still met this problem.

I cannot solve it. And I have to use another version...

YanYan0716 commented 5 years ago

any one solved the problem? I met the same problem,thanks a lot

LeLiu commented 5 years ago

I met the same problem

chengduoZH commented 5 years ago

@LeLiu You can try the following solution:

you should ensure that the GPU is not used by others and all the GPU memory are used by your program.
you should decrease the batch size, and in extreme case, the batch size can be set to 1.
you can try to set FLAGS_fraction_of_gpu_memory_to_use =0.0, in this condition, the program may be slower, because the program will call cudaMalloc and cudaFree directly, and those two functions are sync.

zhhsplendid commented 5 years ago

@LeLiu, another suggestion. Could you run “nvidia-smi” in your terminal to make sure the GPU device you are using has enough available space? That is, no other processes are using the GPU

LeLiu commented 5 years ago

@chengduoZH @zhhsplendid thank you very much for your reply. as you suggestion, i set FLAGS_fraction_of_gpu_memory_to_use = 0.0 and batch_size = 1，but it still not work. and the “nvidia-smi” shows that no processes were running on the GPU.

Sun Mar 31 23:54:25 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 390.12                 Driver Version: 390.12                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN Xp            Off  | 00000000:65:00.0 Off |                  N/A |
| 23%   42C    P0    60W / 250W |      0MiB / 12195MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

my code was working well on the CPUs but crashed while using the GPU。the following is the error log.

W0401 00:48:14.767410 83126 device_context.cc:263] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 9.1, Runtime API Version: 9.0
W0401 00:48:14.767463 83126 device_context.cc:271] device: 0, cuDNN Version: 7.0.
W0401 00:48:15.208385 83126 batch_norm_op.cu:169] Only 1 element in normalization dimension, we skip the batch norm calculation, let y = x.
W0401 00:48:15.226229 83126 system_allocator.cc:122] Cannot malloc 2700 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Current value is 0
W0401 00:48:15.226348 83126 legacy_allocator.cc:191] Cannot allocate 2.636719GB in GPU 0, available 834.625000MB
W0401 00:48:15.226358 83126 legacy_allocator.cc:194] total 12788105216
W0401 00:48:15.226367 83126 legacy_allocator.cc:195] GpuMinChunkSize 256.000000B
W0401 00:48:15.226377 83126 legacy_allocator.cc:198] GpuMaxChunkSize 0.000000B
W0401 00:48:15.226387 83126 legacy_allocator.cc:201] GPU memory used: 0.000000B
W0401 00:48:15.227049 83126 system_allocator.cc:122] Cannot malloc 2700 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Current value is 0
W0401 00:48:15.227165 83126 legacy_allocator.cc:191] Cannot allocate 2.636719GB in GPU 0, available 834.625000MB
W0401 00:48:15.227174 83126 legacy_allocator.cc:194] total 12788105216
W0401 00:48:15.227195 83126 legacy_allocator.cc:195] GpuMinChunkSize 256.000000B
W0401 00:48:15.227207 83126 legacy_allocator.cc:198] GpuMaxChunkSize 0.000000B
W0401 00:48:15.227221 83126 legacy_allocator.cc:201] GPU memory used: 0.000000B
W0401 00:48:15.227836 83126 system_allocator.cc:122] Cannot malloc 2700 MB GPU memory. Please shrink FLAGS_fraction_of_gpu_memory_to_use environment variable to a lower value. Current value is 0
W0401 00:48:15.227939 83126 legacy_allocator.cc:191] Cannot allocate 2.636719GB in GPU 0, available 834.625000MB
W0401 00:48:15.227948 83126 legacy_allocator.cc:194] total 12788105216
W0401 00:48:15.227957 83126 legacy_allocator.cc:195] GpuMinChunkSize 256.000000B
W0401 00:48:15.227967 83126 legacy_allocator.cc:198] GpuMaxChunkSize 0.000000B
W0401 00:48:15.227975 83126 legacy_allocator.cc:201] GPU memory used: 0.000000B
Traceback (most recent call last):
  File "train.py", line 110, in <module>
    main(sys.argv)
  File "train.py", line 107, in main
    train()
  File "train.py", line 82, in train
    train_loss, train_acc = exe.run(main_program, feed=feeder.feed(data), fetch_list=[loss, acc])
  File "/nfs/user/liule/opt/anaconda3/envs/paddle-gpu/lib/python3.6/site-packages/paddle/fluid/executor.py", line 525, in run
    use_program_cache=use_program_cache)
  File "/nfs/user/liule/opt/anaconda3/envs/paddle-gpu/lib/python3.6/site-packages/paddle/fluid/executor.py", line 591, in _run
    exe.run(program.desc, scope, 0, True, True)
paddle.fluid.core.EnforceNotMet: Invoke operator fetch error.
Python Callstacks:
  File "/nfs/user/liule/opt/anaconda3/envs/paddle-gpu/lib/python3.6/site-packages/paddle/fluid/framework.py", line 1317, in append_op
    attrs=kwargs.get("attrs", None))
  File "/nfs/user/liule/opt/anaconda3/envs/paddle-gpu/lib/python3.6/site-packages/paddle/fluid/executor.py", line 361, in _add_feed_fetch_ops
    attrs={'col': i})
  File "/nfs/user/liule/opt/anaconda3/envs/paddle-gpu/lib/python3.6/site-packages/paddle/fluid/executor.py", line 588, in _run
    fetch_var_name=fetch_var_name)
  File "/nfs/user/liule/opt/anaconda3/envs/paddle-gpu/lib/python3.6/site-packages/paddle/fluid/executor.py", line 525, in run
    use_program_cache=use_program_cache)
  File "train.py", line 82, in train
    train_loss, train_acc = exe.run(main_program, feed=feeder.feed(data), fetch_list=[loss, acc])
  File "train.py", line 107, in main
    train()
  File "train.py", line 110, in <module>                                                                                                                                                           [54/1201]
    main(sys.argv)
C++ Callstacks:
cudaMemcpy failed in paddle::platform::GpuMemcpySync (0x7f6c65e0cc40 -> 0x7f6974bff040, length: 4): an illegal memory access was encountered at [/paddle/paddle/fluid/platform/gpu_info.cc:234]
PaddlePaddle Call Stacks:
0       0x7f6c84b0c8d5p void paddle::platform::EnforceNotMet::Init<char const*>(char const*, char const*, int) + 357
1       0x7f6c84b0cc59p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 137
2       0x7f6c865d0bbcp paddle::platform::GpuMemcpySync(void*, void const*, unsigned long, cudaMemcpyKind) + 188
3       0x7f6c84c25dcbp void paddle::memory::Copy<paddle::platform::CPUPlace, paddle::platform::CUDAPlace>(paddle::platform::CPUPlace, void*, paddle::platform::CUDAPlace, void const*, unsigned long, CUstr
eam_st*) + 91
4       0x7f6c8657c90bp paddle::framework::TensorCopySync(paddle::framework::Tensor const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boost
::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant:
:void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::det
ail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&, paddle::framework::Tensor*) + 827
5       0x7f6c860811d2p paddle::operators::FetchOp::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boos
t::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant
::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::de
tail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 626
6       0x7f6c86518575p paddle::framework::OperatorBase::Run(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, paddle::platform::CUDAPinnedPlace, boo
st::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::varian
t::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::d
etail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) + 341
7       0x7f6c84c2941ap paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool) + 218
8       0x7f6c84c2b415p paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool) + 261
terminate called after throwing an instance of 'paddle::platform::EnforceNotMet'
  what():  cudaFree{Host} failed in GPUAllocator::Free.: an illegal memory access was encountered at [/paddle/paddle/fluid/memory/detail/system_allocator.cc:150]
PaddlePaddle Call Stacks:
0       0x7f6c84b0c8d5p void paddle::platform::EnforceNotMet::Init<char const*>(char const*, char const*, int) + 357
1       0x7f6c84b0cc59p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 137
2       0x7f6c865cd87bp paddle::memory::detail::GPUAllocator::Free(void*, unsigned long, unsigned long) + 187
3       0x7f6c865cb922p paddle::memory::detail::BuddyAllocator::Free(void*) + 1122
4       0x7f6c865c7247p void paddle::memory::legacy::Free<paddle::platform::CUDAPlace>(paddle::platform::CUDAPlace const&, void*, unsigned long) + 39
5       0x7f6c865c72bdp paddle::memory::allocation::LegacyAllocator::Free(paddle::memory::allocation::Allocation*) + 77
6       0x7f6c84b0f329p std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release() + 57
7       0x7f6c84b10358p paddle::framework::Variable::PlaceholderImpl<paddle::framework::LoDTensor>::~PlaceholderImpl() + 56
8       0x7f6c8656c56dp paddle::framework::Scope::~Scope() + 157
9       0x7f6c8656c481p paddle::framework::Scope::DropKids() + 65
10      0x7f6c8656c4edp paddle::framework::Scope::~Scope() + 29
11      0x7f6c84c633d6p paddle::framework::ScopePool::DeleteScope(paddle::framework::Scope*) + 22
12      0x7f6c84c63431p paddle::framework::ScopePool::Clear() + 65

*** Aborted at 1554050895 (unix time) try "date -d @1554050895" if you are using GNU date ***
PC: @                0x0 (unknown)
*** SIGABRT (@0x7d1000144b6) received by PID 83126 (TID 0x7f6cdc5c3740) from PID 83126; stack trace: ***
    @     0x7f6cd0937e37 _Unwind_Resume
    @     0x7f6c865cbc75 paddle::memory::detail::BuddyAllocator::Free()
    @     0x7f6c865c7247 paddle::memory::legacy::Free<>()
    @     0x7f6c865c72bd paddle::memory::allocation::LegacyAllocator::Free()

qingqing01 commented 5 years ago

@LeLiu

I clean some useless logs in your above comments.
I set batch_size 4 based on ONE GPU, v100, 16G, there is no problem. And I do not set FLAGS_fraction_of_gpu_memory_to_use :

run commond:

export CUDA_VISIBLE_DEVICES=0
python -u train.py --batch_size=4 --pretrained_model=vgg_ilsvrc_16_fc_reduced --data_dir=/home/users/data/WIDERFACE/

dataset in /home/users/data/WIDERFACE/:

|-- wider_face_split
|   |-- readme.txt
|   |-- wider_face_test_filelist.txt
|   |-- wider_face_test.mat
|   |-- wider_face_train_bbx_gt.txt
|   |-- wider_face_train.mat
|   |-- wider_face_val_bbx_gt.txt
|   `-- wider_face_val.mat
|-- WIDER_test
|   `-- images
|-- WIDER_train
|   `-- images
`-- WIDER_val
    `-- images

log:

-----------  Configuration Arguments -----------
batch_num: None
batch_size: 4
data_dir: /home/users/dangqingqing/data/WIDERFACE/
enable_ce: False
epoc_num: 160
learning_rate: 0.001
mean_BGR: 104., 117., 123.
model_save_dir: output
num_devices: 1
parallel: True
pretrained_model: vgg_ilsvrc_16_fc_reduced
resize_h: 640
resize_w: 640
use_gpu: True
use_pyramidbox: True
with_mem_opt: True
------------------------------------------------
memory_optimize is deprecated. Use CompiledProgram and Executor
W0401 10:31:45.853430 111517 device_context.cc:261] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.2
W0401 10:31:45.856607 111517 device_context.cc:269] device: 0, cuDNN Version: 7.0.
import ujson error: No module named ujson use json
ParallelExecutor is deprecated. Please use CompiledProgram and Executor. CompiledProgram is a central place for optimization and Executor is the unified executor. Example can be found in compiler.py.
W0401 10:31:46.688274 111517 graph.h:204] WARN: After a series of passes, the current graph can be quite different from OriginProgram. So, please avoid using the `OriginProgram()` method!
Pass 0, batch 0, face loss 10.494228, head loss 9.767658, time 0.00272
Pass 0, batch 10, face loss 8.338327, head loss 10.441329, time 0.88548 
# ...
Pass 0, batch 3010, face loss 3.260000, head loss 2.577861, time 0.89221
Pass 0, batch 3020, face loss 2.235071, head loss 2.160218, time 0.91566

And although I set export FLAGS_fraction_of_gpu_memory_to_use=0, there is no problem in my machine.

LeLiu commented 5 years ago

@qingqing01 thank you very much.

I didn't run the WIDERFACE face detection model. I ran a code written by myself and use private data, just met the same problem of this issue. sorry I didn't make it clear.

With batch size 1,4,32,64,128, it all failed. Could it be some issues of GPU/CUDA configuration(but other programs use CUDA worked well)?

qingqing01 commented 5 years ago

@LeLiu Please note, the reader process use Python multi-process, if failed once, you must kill all the processes. You can try to run PyramidBox. Or if it's convenient, you can give me your code, I can try on my machine.

LeLiu commented 5 years ago

@LeLiu Please note, the reader process use Python multi-process, if failed once, you must kill all the processes. You can try to run PyramidBox. Or if it's convenient, you can give me your code, I can try on my machine.

thank you again.

I think I've sloved this problem. I read the log carefully and found that a lot of memory was being used(more than 10GB). After simplifying the network and using a smaller batch size (8), the error disappeared.

W0402 14:49:10.794189  8209 legacy_allocator.cc:191] Cannot allocate 227.777344MB in GPU 0, available 1.137329GB
W0402 14:49:10.794214  8209 legacy_allocator.cc:194] total 12788105216
W0402 14:49:10.794255  8209 legacy_allocator.cc:195] GpuMinChunkSize 256.000000B
W0402 14:49:10.794281  8209 legacy_allocator.cc:198] GpuMaxChunkSize 10.409210GB
W0402 14:49:10.794302  8209 legacy_allocator.cc:201] GPU memory used: 10.346843GB

But I'm still not sure if I have the same problem with @abcdvzz , because I do not understand the log very well.

phamkhactu commented 4 years ago

i have problem with train.py. I debug, it show error in line

    exe = fluid.Executor(place)
    exe.run(startup_prog)

and it show

-----------  Configuration Arguments -----------
batch_num: None
batch_size: 4
data_dir: data
enable_ce: False
epoc_num: 160
learning_rate: 0.001
mean_BGR: 104., 117., 123.
model_save_dir: output
num_devices: 1
parallel: True
pretrained_model: vgg_ilsvrc_16_fc_reduced
resize_h: 640
resize_w: 640
use_gpu: True
use_multiprocess: True
use_pyramidbox: True
------------------------------------------------
2020-02-28 08:59:28,675-WARNING: paddle.fluid.layers.py_reader() may be deprecated in the near future. Please use paddle.fluid.io.DataLoader.from_generator() instead.
/usr/local/lib/python3.6/dist-packages/paddle/fluid/executor.py:779: UserWarning: The following exception is not an EOF exception.
  "The following exception is not an EOF exception.")
Traceback (most recent call last):
  File "train.py", line 284, in <module>
    train(args, config, train_parameters, train_file_list)
  File "train.py", line 157, in train
    exe.run(startup_prog)
  File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/executor.py", line 780, in run
    six.reraise(*sys.exc_info())
  File "/root/.local/lib/python3.6/site-packages/six.py", line 703, in reraise
    raise value
  File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/executor.py", line 775, in run
    use_program_cache=use_program_cache)
  File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/executor.py", line 822, in _run_impl
    use_program_cache=use_program_cache)
  File "/usr/local/lib/python3.6/dist-packages/paddle/fluid/executor.py", line 899, in _run_program
    fetch_var_name)
paddle.fluid.core_avx.EnforceNotMet: 

--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0   std::string paddle::platform::GetTraceBackString<char const*>(char const*&&, char const*, int)
1   paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int)
2   paddle::platform::CUDADeviceContext::CUDADeviceContext(paddle::platform::CUDAPlace)
3   std::_Function_handler<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > (), std::reference_wrapper<std::_Bind_simple<paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<paddle::platform::Place, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::less<paddle::platform::Place>, std::allocator<std::pair<paddle::platform::Place const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > > > >*, paddle::platform::Place)::{lambda()#1} ()> > >::_M_invoke(std::_Any_data const&)
4   std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::__future_base::_Result_base::_Deleter>, std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > >::_M_invoke(std::_Any_data const&)
5   std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
6   std::__future_base::_Deferred_state<std::_Bind_simple<paddle::platform::EmplaceDeviceContext<paddle::platform::CUDADeviceContext, paddle::platform::CUDAPlace>(std::map<paddle::platform::Place, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >, std::less<paddle::platform::Place>, std::allocator<std::pair<paddle::platform::Place const, std::shared_future<std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > > > > >*, paddle::platform::Place)::{lambda()#1} ()>, std::unique_ptr<paddle::platform::DeviceContext, std::default_delete<paddle::platform::DeviceContext> > >::_M_run_deferred()
7   paddle::platform::DeviceContextPool::Get(paddle::platform::Place const&)
8   paddle::framework::GarbageCollector::GarbageCollector(paddle::platform::Place const&, unsigned long)
9   paddle::framework::UnsafeFastGPUGarbageCollector::UnsafeFastGPUGarbageCollector(paddle::platform::CUDAPlace const&, unsigned long)
10  paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool, bool)
11  paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, std::vector<std::string, std::allocator<std::string> > const&, bool)

----------------------
Error Message Summary:
----------------------
Error: Paddle internal Check failed. (Please help us create a new issue, here we need to find the developer to add a user friendly error message): out of memory at (/paddle/paddle/fluid/platform/device_context.cc:220)

and my nvidia

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.33.01    Driver Version: 440.33.01    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:01:00.0 Off |                  N/A |
| 45%   56C    P2    41W / 260W |  10933MiB / 11019MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     23333      G   /usr/lib/xorg/Xorg                            66MiB |
|    0     23412      G   /usr/bin/sddm-greeter                         48MiB |
+-----------------------------------------------------------------------------+

PaddlePaddle / models