About trained from AlexeyAB

jayer95 commented 3 years ago

May I ask you, when I do sparse training, can I use AlexeyAB trained .weights?

In other words, the basic training part is trained by the YOLO framework maintained by AlexeyAB, and the sparse training is used by a well-trained AlexeyAB .weight model.

Because when I use your code for basic training, it often fails to converge. (YOLOv4)

TNTWEN commented 3 years ago

@jayer95 我觉得可以确定了是https://github.com/tanluren/yolov3-channel-and-layer-pruning 对模型的实现方式里出了错，你训练出来的模型在darknet检测的结果和IR模型在openvino检测的结果一致，都是无法检出（置信度都只有0.01）。

而且https://github.com/tanluren/yolov3-channel-and-layer-pruning 犯的错误不是简单的通道合并错误，因为我改变通道后在darknet检测依旧不正常。

你可以去问一下https://github.com/tanluren/yolov3-channel-and-layer-pruning 的作者，为何pytorch训练出的tiny模型在darknet检测不出

还有一种方式是，你在darknet 用剪枝后的cfg文件，不加载权重文件训练从头开始训练，这样训练出的结果应该就可以在OpenVINO正常检测了

jayer95 commented 3 years ago

@TNTWEN 好的，我去詢問他，

https://github.com/tanluren/yolov3-channel-and-layer-pruning/issues/146

那如果他那邊整個對於YOLOv4-Tiny的剪枝作業與Darknet有太大偏差的話，是不是目前在您新新增的https://github.com/TNTWEN/OpenVINO-YOLO-Automatic-Generation/tree/master/yolov4tiny 就算正確導出YOLOv4-Tiny IR Model，也沒辦法正確檢測到東西在OpenVINO端？

請問我在修剪後的cfg，在darknet不加載任何的Pre-Trained Model訓練的方式，也能達到剪枝的效果嗎？相當於yolov3-channel-and-layer-pruning的最後一個步驟retrain嗎？我重新在darknet訓練的模型，就可以順利使用您這邊做轉換了吧哈哈哈？

TNTWEN commented 3 years ago

@jayer95 只要修剪后的cfg在darknet训练 map比较理想，也就达到剪枝的效果。可以看作retrain 只要darknet检测正常https://github.com/TNTWEN/OpenVINO-YOLO-Automatic-Generation/tree/master/yolov4tiny 就能正常在OpenVINO运行

jayer95 commented 3 years ago

@TNTWEN 關於您提到的修改昨天轉換代碼的此部份，

def _tiny_res_block(inputs,in_channels,channel1,channel2,channel3,data_format): net = _conv2d_fixed_padding(inputs,in_channels,kernel_size=3)

route = net
#_,split=tf.split(net,num_or_size_splits=2,axis=1 if data_format =="NCHW" else 3)
# split = net[:, in_channels//2:, :, :]if data_format=="NCHW" else net[:, :, :, in_channels//2:]
split = net[:,0：in_channels//2, :, :]if data_format=="NCHW" else net[:, :, :, 0:in_channels//2]
net = _conv2d_fixed_padding(split,channel1,kernel_size=3)
route1 = net
net = _conv2d_fixed_padding(net,channel2,kernel_size=3)
net = tf.concat([net, route1], axis=1 if data_format == 'NCHW' else 3)
net = _conv2d_fixed_padding(net,channel3,kernel_size=1)
feat = net
net = tf.concat([route, net], axis=1 if data_format == 'NCHW' else 3)
net = slim.max_pool2d(
    net, [2, 2], scope='pool2')
return net,feat

我將， split = net[:, in_channels//2:, :, :]if data_format=="NCHW" else net[:, :, :, in_channels//2:] 註解，然後修改為， split = net[:,0：in_channels//2, :, :]if data_format=="NCHW" else net[:, :, :, 0:in_channels//2] 報了這錯誤， Traceback (most recent call last): File "convert_weights_pb.py", line 6, in import yolo_v4_tiny File "/home/jayer95/OpenVINO-YOLOV4/yolo_v4_tiny.py", line 23 split = net[:,0：in_channels//2, :, :]if data_format=="NCHW" else net[:, :, :, 0:in_channels//2] ^ SyntaxError: invalid character in identifier

還沒修改之前我轉了2020.4與2021.1與2021.2的版本，皆沒有輸出bboxes，目前先拿剪枝過後的.cfg在darknet重新訓練，請問Pre-Trained Model不能加載在yolov3-channel-and-layer-pruning剪枝完成的.weight嗎？

TNTWEN commented 3 years ago

@jayer95 转换代码那个不用试了，我在darknet上等效地测试了一下，检测不出，我的转换代码对模型处理方式是和darknet保持一致的。可以确定是pytorch训练的问题

pytorch对模型处理有误，就意味着和darknet训练的不是同一个模型了。用错误的模型加载有很大概率无法收敛。不加载预训练模型是最好的选择。

jayer95 commented 3 years ago

@TNTWEN 我最近有在AlexeyAB訓練https://github.com/AlexeyAB/darknet/blob/master/cfg/yolov4x-mish.cfg 但不知道為什麼Avg loss始終保持在80幾下不去，我曾經訓練到4~5次Interations但都還是在80幾上下，會不會是因為您之前提到過mish激活函數尚未支援？我在3090與1080Ti上訓練目前都是如此，請問您有訓練過此YOLOv4x-Mish嗎？

chart_yolov4x-mish-license_plate

關於在darknet上重新訓練，明白了，我趕緊來試試！

TNTWEN commented 3 years ago

@jayer95 我没有训练过，但应该不会是mish的问题，pytorch没有对mish优化，而darknet已经选择了最优的实现方式

jayer95 commented 3 years ago

@TNTWEN 我找到類似的問題，似乎已經得到解答， https://github.com/AlexeyAB/darknet/issues/7131 我將會訓練yolov4x-mish.cfg，將此模型作為自動標註的模型依據，

對了，剪枝完成的YOLOv4-Tiny，我正在darknet重新訓練中，預計一天以內會有成果，若是我接著對YOLOv4-Tiny-3L做剪枝作業，剪枝完成後，步驟也是回darknet重新建構模型，並且再到您更新的OpenVINO-YOLO-Automatic-Generation/tree/master/yolov4tiny 做自動轉換為IR Model，我的理解應該沒錯吧? 對了，YOLOv4-Tiny-3L除了能剪通道以外，還能剪層嗎?

TNTWEN commented 3 years ago

@jayer95 ”若是我接著對YOLOv4-Tiny-3L做剪枝作業，剪枝完成後，步驟也是回darknet重新建構模型，並且再到您更新的OpenVINO-YOLO-Automatic-Generation/tree/master/yolov4tiny 做自動轉換為IR Model“

没错，但是YOLOv4-Tiny-3L也只能剪通道，只有cfg中有[shortcut]结构的才支持剪层

jayer95 commented 3 years ago

@TNTWEN YOLOv4-Tiny做完的剪枝模型，回darknet重新建構模型後，使用您昨天教學的轉換代碼轉成IR Models順利出框了！效果很好！我用您昨日更新的OpenVINO-YOLO-Automatic-Generation/tree/master/yolov4tiny 做一次生成YOLOv4-Tiny剪枝轉換代碼，我發現一些地方錯誤，轉不過去，第一個部份是倒數第3行的detect_2 = tf.identity(detect_2, name='detect_2')，缺少換行我我字更改了parse_config.py

"detect_2 = _detection_layer(net, num_classes, _ANCHORSTINY[1:4], img_size, data_format)",\
"detect_2 = tf.identity(detect_2, name='detect_2')",

這部份可能還需要請您過目，並且對於您的代碼進行修正， Screenshot-20210226134917-1362x439 以及，修正後會出現一個錯誤，

NameError: name '_ANCHORSTINY' is not defined

我將"_ANCHORSTINY"更換為"_ANCHORS"即可順利建構.pb，在昨日您教學的教學代碼上是使用"_ANCHORS"，請問哪一個才是正確的呢？ YOLOv4-Tiny-3L我還沒有測試過自動轉換代碼的生成！

TNTWEN commented 3 years ago

@jayer95 刚刚修复了一下，欢迎再次测试不过因为我只用到了yolo_v4_tiny.py, 我在yolo_v4_tiny.py已经定义了_ANCHORSTINY，不知道是不是你把parse_config.py的输出粘贴到了yolo_v4.py中导致的？

对于yolov4-tiny 和 yolov4-tiny-3l，使用 convert_weights_pb.py 时候，都需要加参数 --tiny 相对应的我还修复了yolov4-tiny-3l的json文件https://github.com/TNTWEN/OpenVINO-YOLO-Automatic-Generation/blob/346b428b01e37a51d02d1b549b96e02aef53b444/yolov4tiny/yolov4_tiny_3l.json#L11
使用mo.py时候，要使用正确的json文件

TNTWEN commented 3 years ago

@jayer95 yolov4-tiny:"anchors": [10, 14, 23, 27, 37, 58, 81, 82, 135, 169, 344, 319] yolov4-tiny-3l："anchors": [12, 16, 19, 36, 40, 28, 36, 75, 76, 55, 72, 146, 142, 110, 192, 243, 459, 401]

这两个模型的anchors不一样，我改了一下命名，区分了一下

jayer95 commented 3 years ago

@TNTWEN 沒問題，我再試試，我確定我是粘貼到yolo_v4_tiny.py，但可能因為我的OpenVINO-YOLOv4轉換那邊的版本比較舊，我再試試，關於正確指定.json，我會注意，真的很感謝您！我測試了YOLOv4-Tiny的IR Model，FP32/FP16-INT8，最大與最小尺度的模型，在我的CPU運算上快了3FPS，精準度應該是降了一些，但模型從3MB縮小到了740KB！

jayer95 commented 3 years ago

@TNTWEN 請問您有在https://github.com/tanluren/yolov3-channel-and-layer-pruning 訓練過yolov4-tiny-3l嗎？我在那訓練yolov4-tiny-3l時會出現很可怕的錯誤， 3090 GPU記憶體會無法釋放導致只能重新開機，但我在他那訓練yolov4-tiny時就可以正常訓練，與在您這邊訓練一樣正常，我是直接複製AlexeyAB darknet提供的yolov4-tiny-3l.cfg過去yolov3-channel-and-layer-pruning， pre-trained model 我使用已經在AlexeyAB darknet訓練完好的yolov4-tiny-3l與yolov4-tiny.conv.29，但都會出現很奇怪的錯誤，我甚至在我另一台1080Ti*2上做同樣動作都會出一樣問題， Screenshot-20210226180410-1118x717 Screenshot-20210226180345-989x784 會不會是yolov4-tiny-3l.cfg後面擺入了第3層yolo層導致？也想請問您訓練時的Input shape如果想從416改成608該改哪裡的代碼？是yolov3-channel-and-layer-pruning/utils/datasets.py嗎？

TNTWEN commented 3 years ago

@jayer95 不知道今后有空的时候是否愿意分享IR模型量化的细节，你可以fork 这个项目https://github.com/TNTWEN/Pruned-OpenVINO-YOLO ，在Readme添加相关笔记，哈哈！

我没有在https://github.com/tanluren/yolov3-channel-and-layer-pruning剪枝过tiny的模型，但目前看来https://github.com/tanluren/yolov3-channel-and-layer-pruning 对yolov4-tiny和yolov4-tiny-3l的支持确实是存在问题，并且从报错的信息上也很难确定是什么问题导致的，我可以试着通过QQ去询问一下tanluren大佬

修改input大小：https://github.com/tanluren/yolov3-channel-and-layer-pruning/blob/9220f301ed2fea90b0ce3e179f825dba46e7aace/train.py#L502 416改为608就可以

TNTWEN commented 3 years ago

@jayer95 我问了作者，yolov4-tiny-3l确实还不支持，需要修改代码。yolov4-tiny在darknet检测不出的问题还需要排查，我有空时候去https://github.com/tanluren/yolov3-channel-and-layer-pruning 仔细阅读一遍源码。

jayer95 commented 3 years ago

@TNTWEN 沒問題，我有空時會立馬補上，可以先參考此處: https://docs.openvinotoolkit.org/latest/pot_README.html 請問您常用的作業系統是? 我做量化是在Ubuntu 20.04，如果是Windows我沒有試過，以往提升速度我都是使用OpenVINO提供的模型量化，最近剛好有時間可以接觸模型剪枝，

原來yolov4-tiny-3l還不支持，這個模型我很看好它呢，感謝您有空時的幫忙，如果是密切討論的話是否用微信或是QQ?

TNTWEN commented 3 years ago

@jayer95 近期课业比较重，可能回复不太及时^-^ 我平时Windows ，Ubuntu 18.04，20.04 都有在使用，你有空的时候补上Ubuntu 20.04的模型量化即可

可以添加我的QQ：2777622181

TNTWEN / Pruned-OpenVINO-YOLO

About trained from AlexeyAB #6