Thinklab-SJTU / R3Det_Tensorflow

Code for AAAI 2021 paper: R3Det: Refined Single-Stage Detector with Feature Refinement for Rotating Object
Apache License 2.0
540 stars 122 forks source link

进程被杀死 #105

Closed baltam closed 3 years ago

baltam commented 3 years ago

非常感谢dalao的开源贡献,今天用r3det的时候遇到了一个问题一直无法解决: 在执行tools/multi_gpu_train_r3det.py文件时,本来运行好好的,然后风扇声突然增大,鼠标动不了,过了一会报了如下信息:

restore from pretrained_weighs in IMAGE_NET
WARNING:tensorflow:From /media/ly/Data/R3Det_Tensorflow-master/tools/multi_gpu_train_r3det.py:319: start_queue_runners (from tensorflow.python.training.queue_runner_impl) is deprecated and will be removed in a future version.
Instructions for updating:
To construct input pipelines, use the `tf.data` module.
WARNING:tensorflow:From /media/ly/Data/R3Det_Tensorflow-master/tools/multi_gpu_train_r3det.py:323: The name tf.summary.FileWriter is deprecated. Please use tf.compat.v1.summary.FileWriter instead.

WARNING:tensorflow:From /home/ly/anaconda3/envs/tensorflow1/lib/python3.6/site-packages/tensorflow/python/training/saver.py:1276: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version.
Instructions for updating:
Use standard file APIs to check for files with this prefix.
restore model
已杀死

经过top和nvidia-smi后发现,每次读入

_, global_stepnp, total_loss_dict_ = \
                            sess.run([train_op, global_step, total_loss_dict])

这行代码时,cpu的利用率会达到537%左右,但是gpu的占用率一直都是百分之几左右。然后会卡一会儿,卡完vscode就弹出已杀死了。

我的硬件配置

拯救者y7000 gpu:1050ti 4g显存 cpu:一个 server 有1 个 chip,每个 chip 上有4个 core,每个 core 有 8 个 processor

我的开发环境

ubuntu18.04 python3.6 cuda10.0 tensorflow-gpu 1.14

希望dalao有空的时候能够抽空余时间帮我解答一下~

yangxue0827 commented 3 years ago

显存不够,一般需要7g以上

baltam commented 3 years ago

显存不够,一般需要7g以上 哇,这么快就回复了,感谢dalao的回复!