Closed chenyangMl closed 4 years ago
@chenyangMl I have encountered this problem before when using teriminal to train the model. But when I used vscode debug mode to train the script, this problem did not happen. You can have a try.
@Yuliang-Liu I suffer same problem with teriminal and vscode. Have any idea to debug?
@chenyangMl Could you plz tell me the commands to run the training process and where to specify the path to train.json and training data and txt-format annotatons? Thank you!
@shuangyichen @Yuliang-Liu Running train_net.py use command "OMP_NUM_THREADS=1 python tools/train_net.py --config-file configs/BAText/TotalText/attn_R_50.yaml --num-gpus 1"
dataset arch: datasets
specify train img and annotations in "builtin.py": "mydataset_train":("mydataset/train_img","mydataset/annotations/train.json")
specify train config in "configs/BAText/TotalText/Base-TotalText.yaml" DATASETS: TRAIN: ("mydataset_train",) TEST: ("mydataset_train",)
one txt-format annotatons of dataset looks like: 24.49,22.09,231.04,18.89,229.73,18.78,436.7,16.86,436.12,68.6,230.59,72.38,230.59,72.38,25.07,76.16||||text 25.07,76.16,284.38,73.13,282.66,72.68,542.51,70.35,543.67,117.44,284.95,120.35,284.95,120.35,26.23,123.26||||text 25.98,121.84,282.61,119.3,280.6,118.45,537.86,116.86,539.6,157.56,283.79,160.76,283.79,160.76,27.98,163.95||||text 29.72,168.02,285.94,164.15,284.76,163.48,541.35,161.05,543.09,202.33,286.41,205.52,286.41,205.52,29.72,208.72||||text 27.98,213.95,287.74,210.62,285.56,209.03,546.0,206.98,546.58,248.26,286.99,254.07,286.99,254.07,27.4,259.88||||text 27.4,265.7,282.33,260.69,280.67,259.16,536.12,254.65,537.28,297.09,282.34,300.58,282.34,300.58,27.4,304.07||||text 25.65,303.49,285.68,303.95,283.59,301.72,544.26,303.49,544.26,344.77,284.95,344.77,284.95,344.77,25.65,344.77||||text 18.67,406.4,75.54,411.42,73.47,400.16,130.88,406.4,130.88,448.84,74.78,448.84,74.78,448.84,18.67,448.84||||text 111.12,484.88,332.01,484.72,330.61,482.68,551.81,484.88,551.81,536.05,331.47,536.05,331.47,536.05,111.12,536.05||||text 23.3,356.4,199.56,348.83,376.0,348.73,552.37,347.67,553.53,390.12,376.6,393.01,199.63,392.93,22.72,397.09||||text
And i found the problem when program run to line-102 of "tool/train_net.py", But it cannot run next step. looks like stack in loop.
Thank you so much!
I tried as you mentioned. But met this problem. have you got any idea? Thank you!
[06/16 08:29:02 adet.data.detection_utils]: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800, 832, 864, 896), max_size=1600, sample_style='choice')] Traceback (most recent call last): File "/root/detectron2/detectron2/data/catalog.py", line 55, in get f = DatasetCatalog._REGISTERED[name] KeyError: 'mydataset_train'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "tools/train_net.py", line 243, in
@shuangyichen You have to specify your "mydataset_train" key in exist dict(_PREDEFINED_SPLITS_TEXT) of "adet/data/builtin.py".
it looks like this: _PREDEFINED_SPLITS_TEXT = { "syntext1_train": ("syntext1/images", "syntext1/annotations/train.json"), "syntext2_train": ("syntext2/images", "syntext2/annotations/train.json"), "mltbezier_word_train": ("mlt2017/images","mlt2017/annotations/train.json"), "mydataset_train":("mydataset/train_img","mydataset/annotations/train.json") }
Succeed! Thank you! I used Xshell to train. And training message was like
[06/16 11:08:59 d2.utils.events]: eta: 1:05:34 iter: 19 total_loss: 6.228 rec_loss: 2.394 loss_fcos_cls: 0.734 loss_fcos_loc: 0.317 loss_fcos_ctr: 0.670 loss_fcos_bezier: 2.121 time: 0.8739 data_time: 0.0279 lr: 0.000020 max_mem: 13278M [06/16 11:09:16 d2.utils.events]: eta: 1:06:17 iter: 39 total_loss: 4.955 rec_loss: 1.581 loss_fcos_cls: 0.373 loss_fcos_loc: 0.277 loss_fcos_ctr: 0.662 loss_fcos_bezier: 2.060 time: 0.8595 data_time: 0.0128 lr: 0.000040 max_mem: 13278M [06/16 11:09:33 d2.utils.events]: eta: 1:06:36 iter: 59 total_loss: 4.576 rec_loss: 1.397 loss_fcos_cls: 0.267 loss_fcos_loc: 0.263 loss_fcos_ctr: 0.649 loss_fcos_bezier: 1.938 time: 0.8549 data_time: 0.0128 lr: 0.000060 max_mem: 13278M
Hope this could help you.
@chenyangMl Have you solved your problem?
@Yuliang-Liu Hi, I have found the problem where the program stack in dead loop at line 43~46 of "adet/data/detection_utils.py". Now, i set "crop_box" always True to keep training. But i am worried that it will affect the training results. Wish you can help me.
if not crop_box:
modified = True
while modified:
modified, x0, y0, crop_size = adjust_crop(x0, y0, crop_size, instances)
@chenyangMl this problem is caused by floating point bbox coords, which should have been fixed with this PR: https://github.com/aim-uofa/AdelaiDet/pull/104.
If you find other problems, please feel free to ask.
@chenyangMl can you please inform me how you generate annotation for you custom data set. Have you store it like total-text format in txt
such as:: x,y x,y ...label? If so, could u please inform me how to convert txt to
json like coco formatfor ABCNet? The example scripts showed conversion using
XML, but I don't know how to convert with
txt?
@chenyangMl,请您告诉我您如何为自定义数据集生成注释。您是否以全文本格式存储它,
txt
例如:x,yx,y ... label?如果是这样,您能告诉我如何转换txt to
可可格式for ABCNet? The example scripts showed conversion using
XML, but I don't know how to convert with
txt之类的json 吗?
您好,请问您实现数据集的标注了吗?能否告知一下,如何操作?
Training with custom datasets, programing pause more than 10m with fellowing log, haven't print training meg. I have prepare dataset via example script and checked carefully with outputed dataset. And i try to figure out what wrong with it, but i dont figure out the problem as there aren't error msg. May one have any idea about this problem?
Part of training log:
[06/15 14:38:35 d2.data.common]: Serializing 476 elements to byte tensors and concatenating them all ... [06/15 14:38:35 d2.data.common]: Serialized dataset takes 2.66 MiB [06/15 14:38:35 d2.data.build]: Using training sampler TrainingSampler [06/15 14:38:35 fvcore.common.checkpoint]: Loading checkpoint from pretrained/ctw1500_attn_R_50.pth [06/15 14:38:35 adet.trainer]: Starting training from iteration 0