How to train your own faster-RCNN

How to train your own faster RCNN

前段时间需要自己训练faster RCNN，在网上找了很久资源，查了很多攻略，也遇到了很多问题。现在就总结一下我自己的步骤、遇到了问题，以及解决方法。faster-RCNN 的原理可以参考博客： https://blog.csdn.net/u011746554/article/details/74999010

数据集

CSUST Chinese Traffic Sign Detection Benchmark （中国交通数据集），由长沙理工大学综合交通运输大数据智能处理湖南省重点实验室制作完成。github地址：https://github.com/csust7zhangjm/CCTSDB

该数据集中的交通标志共分为三类：

mandatory（指示标志）

warning（警示标志）

prohibitory（禁止标志）

本项目中用到的训练，验证，测试的图片共1799张。样例图片如下：

00019

数据预处理

由于faster RCNN 的数据格式为PASCAL VOC格式，所以我们将数据集处理为该格式。

PASCAL VOC数据集格式如下：

这其中有三个文件夹中的数据需要自己准备，分别是Annotations, Main, JPEGImages . Annotations文件夹中存放的是对应着所有图片标注的xml文件；Main文件夹里存放四个txt文件，分别是trainval.txt, train.txt, val.txt, test.txt，每个文件中分别是训练验证集，训练集，验证集，测试集的文件名； JPEGImages文件夹里存放着所有jpeg格式的图片。

之前因为用这些图片训练过YOLOV3，因此已经有了YOLO格式的数据标注，即：每张图片对应着一个相同名字的txt，txt内容如下（对应着图片00039.jpg）：

txt中每一行对应着图片中的一个目标，每行中的元素由空格分隔开，第一个为该目标的类别（此例中，0代表mandatory, 1代表warning，2代表prohibitory）；第2到5个元素分别为x, y, w, h，代表该目标bbx 的中心相对横坐标，中心相对纵坐标，相对宽度，相对高度。

有了图片和YOLO标注，将标注转换为xml文件的代码（参考https://zhuanlan.zhihu.com/p/58392978）：

from xml.dom.minidom import Document
import os
import cv2

def makexml(txtPath,xmlPath,picPath): #读取txt路径，xml保存路径，数据集图片所在路径
        dict = {'0': "mandatory",#字典对类型进行转换
                '1': "warning",
                '2': "prohibitory"
                }
        files = os.listdir(txtPath)
        for i, name in enumerate(files):
          xmlBuilder = Document()
          annotation = xmlBuilder.createElement("annotation")  # 创建annotation标签
          xmlBuilder.appendChild(annotation)
          txtFile=open(txtPath+name)
          txtList = txtFile.readlines()
          img = cv2.imread(picPath+name[0:-4]+".jpg")
          Pheight,Pwidth,Pdepth=img.shape

          folder = xmlBuilder.createElement("folder")#folder标签
          folderContent = xmlBuilder.createTextNode("VOC2007")
          folder.appendChild(folderContent)
          annotation.appendChild(folder)

          filename = xmlBuilder.createElement("filename")#filename标签
          filenameContent = xmlBuilder.createTextNode(name[0:-4]+".jpg")
          filename.appendChild(filenameContent)
          annotation.appendChild(filename)

          size = xmlBuilder.createElement("size")  # size标签
          width = xmlBuilder.createElement("width")  # size子标签width
          widthContent = xmlBuilder.createTextNode(str(Pwidth))
          width.appendChild(widthContent)
          size.appendChild(width)
          height = xmlBuilder.createElement("height")  # size子标签height
          heightContent = xmlBuilder.createTextNode(str(Pheight))
          height.appendChild(heightContent)
          size.appendChild(height)
          depth = xmlBuilder.createElement("depth")  # size子标签depth
          depthContent = xmlBuilder.createTextNode(str(Pdepth))
          depth.appendChild(depthContent)
          size.appendChild(depth)
          annotation.appendChild(size)

          segmented = xmlBuilder.createElement("segmented")#folder标签
          segmentedContent = xmlBuilder.createTextNode("0")
          segmented.appendChild(segmentedContent)
          annotation.appendChild(segmented)

          for i in txtList:
             oneline = i.strip().split(" ")

             object = xmlBuilder.createElement("object")
             picname = xmlBuilder.createElement("name")
             nameContent = xmlBuilder.createTextNode(dict[oneline[0]])
             picname.appendChild(nameContent)
             object.appendChild(picname)
             pose = xmlBuilder.createElement("pose")
             poseContent = xmlBuilder.createTextNode("Unspecified")
             pose.appendChild(poseContent)
             object.appendChild(pose)
             truncated = xmlBuilder.createElement("truncated")
             truncatedContent = xmlBuilder.createTextNode("0")
             truncated.appendChild(truncatedContent)
             object.appendChild(truncated)
             difficult = xmlBuilder.createElement("difficult")
             difficultContent = xmlBuilder.createTextNode("0")
             difficult.appendChild(difficultContent)
             object.appendChild(difficult)
             bndbox = xmlBuilder.createElement("bndbox")
             xmin = xmlBuilder.createElement("xmin")
             x1=int(((float(oneline[1]))*Pwidth+1)-(float(oneline[3]))*0.5*Pwidth)
             xminContent = xmlBuilder.createTextNode(str(x1))
             xmin.appendChild(xminContent)
             bndbox.appendChild(xmin)
             ymin = xmlBuilder.createElement("ymin")
             y1 = int(((float(oneline[2]))*Pheight+1)-(float(oneline[4]))*0.5*Pheight)
             yminContent = xmlBuilder.createTextNode(str(y1))
             ymin.appendChild(yminContent)
             bndbox.appendChild(ymin)
             xmax = xmlBuilder.createElement("xmax")
             x2 = int(((float(oneline[1]))*Pwidth+1)+(float(oneline[3]))*0.5*Pwidth)
             xmaxContent = xmlBuilder.createTextNode(str(x2))
             xmax.appendChild(xmaxContent)
             bndbox.appendChild(xmax)
             ymax = xmlBuilder.createElement("ymax")
             y2 = int(((float(oneline[2]))*Pheight+1)+(float(oneline[4]))*0.5*Pheight)
             ymaxContent = xmlBuilder.createTextNode(str(y2))
             ymax.appendChild(ymaxContent)
             bndbox.appendChild(ymax)
             object.appendChild(bndbox)

             annotation.appendChild(object)
             if x1 < 0 or x2 > Pwidth or y1 < 0 or y2 > Pheight or x1 >= x2 or y1 >= y2:
                 print("Error in file " + name)

          f = open(xmlPath+name[0:-4]+".xml", 'w')
          xmlBuilder.writexml(f, indent='\t', newl='\n', addindent='\t', encoding='utf-8')
          f.close()

makexml("txt存放路径","xml存放路径","图片存放路径")

生成Main文件夹里的四个txt文件代码（参考https://www.iteye.com/blog/bangerla-2411875）：

import os
import random

trainval_percent = 0.8
train_percent = 0.7
xmlfilepath = 'D:\\VOC2007\\Annotations'
txtsavepath = 'D:\\VOC2007\\ImageSets\\Main'
total_xml = os.listdir(xmlfilepath)

num=len(total_xml)
list=range(num)
tv=int(num*trainval_percent)
tr=int(tv*train_percent)
trainval= random.sample(list,tv)
train=random.sample(trainval,tr)

ftrainval = open(txtsavepath+'\\trainval.txt', 'w')
ftest = open(txtsavepath+'\\test.txt', 'w')
ftrain = open(txtsavepath+'\\train.txt', 'w')
fval = open(txtsavepath+'\\val.txt', 'w')

for i  in list:
    name=total_xml[i][:-4]+'\n'
    if i in trainval:
        ftrainval.write(name)
        if i in train:
            ftrain.write(name)
        else:
            fval.write(name)
    else:
        ftest.write(name)

ftrainval.close()
ftrain.close()
fval.close()
ftest .close()

安装与训练步骤

环境

Python3.7 CUDA10.0

安装Cython，openCV，easydict

pip install opencv
pip install cython
pip install easydict

Clone repository

git clone https://github.com/endernewton/tf-faster-rcnn.git

修改Arch

将tf-faster-rcnn/lib/setup.py中第130行的'-arch=sm_70'改为和自己GPU型号匹配的算力。具体更改参考http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/

Build the cython modules

cd tf-faster-rcnn/lib
vim Makefile
#line 2：python -> python3
make clean
make
cd ..

Install the Python COCO API

cd data
git clone https://github.com/pdollar/coco.git
cd coco/PythonAPI
vim Makefile
#line 2：python -> python3
make
cd ../..

将之前制作的数据文件夹重命名为VOCdevkit2007，上传到tf-faster-rcnn/data/目录下
下载预训练模型

下载地址：https://github.com/tensorflow/models/tree/master/research/slim#pre-trained-models

下载后的模型存放在tf-faster-rcnn/data/imagenet_weights/目录下。

解压vgg16：

tar -xzvf vgg_16_2016_08_28.tar.gz
mv vgg_16.ckpt vgg16.ckpt

解压res101：

tar -xzvf resnet_v1_101_2016_08_28.tar.gz
mv resnet_v1_101.ckpt res101.ckpt

训练

修改./lib/datasets/pascal_voc.py中的类别，改为自己数据集的类别（不用删除’background’类）。

运行：

./experiments/scripts/train_faster_rcnn.sh [GPU_ID] [DATASET] [NET]

例如：

./experiments/scripts/train_faster_rcnn.sh 0 pascal_voc vgg16
./experiments/scripts/train_faster_rcnn.sh 1 coco res101

训练参数在tf-faster-rcnn/lib/model/config.py中改。如果要重新训练，先删除tf-faster-rcnn/output和tf-faster-rcnn/data/cache文件夹。

训练时为了保证模型在该数据集上收敛，将学习率从0.001改为0.0005，其余参数均设置为默认参数，训练70000 iterations。

测试

运行：

./experiments/scripts/test_faster_rcnn.sh [GPU_ID] [DATASET] [NET]

例如：

./experiments/scripts/test_faster_rcnn.sh 0 pascal_voc vgg16
./experiments/scripts/test_faster_rcnn.sh 1 coco res101

测试结果（不会输出图片结果）：

如果想要输出图片结果：

编辑tf-faster-rcnn/tools/demo.py，CLASSES改为自己数据集的类别。tf-faster-rcnn/data/demo文件夹中存放想测试的图片。在tf-faster-rcnn/下运行：

GPU_ID=0
CUDA_VISIBLE_DEVICES=${GPU_ID} ./tools/demo.py

如果想要保存输出图片，可以在demo.py中的vis_detections函数中添加plt.savefig('/tf-faster-rcnn/data/demo1_output/' + image_name)

如果想要批量处理图片，将demo.py中154行左右的im_names赋值为os.listdir(测试图片文件夹)

Q&A

如果遇到Permission denied

运行chmod 777 *.sh来赋予权限。

如果loss为nan

参考： https://github.com/endernewton/tf-faster-rcnn/issues/86 https://blog.csdn.net/ksws0292756/article/details/80702704

当时我到这个情况时，按照https://blog.csdn.net/ksws0292756/article/details/80702704中所讲的改了一遍，但是依旧loss为nan，最后发现，有一张图片的标注错误（标注出界）。

TiantongWang / MyBlogs