Problem of training ABCNet

chenyangMl commented 4 years ago

Training with custom datasets, programing pause more than 10m with fellowing log, haven't print training meg. I have prepare dataset via example script and checked carefully with outputed dataset. And i try to figure out what wrong with it, but i dont figure out the problem as there aren't error msg. May one have any idea about this problem?

Part of training log:

06/15 14:38:35 adet.data.datasets.text]: Loaded 476 images in COCO format from datasets/hw/annotations/train.json [06/15 14:38:35 d2.data.build]: Removed 0 images with no usable annotations. 476 images left. [06/15 14:38:35 d2.data.build]: Distribution of instances among all 1 categories:	category	#instances
text	6436

[06/15 14:38:35 d2.data.common]: Serializing 476 elements to byte tensors and concatenating them all ... [06/15 14:38:35 d2.data.common]: Serialized dataset takes 2.66 MiB [06/15 14:38:35 d2.data.build]: Using training sampler TrainingSampler [06/15 14:38:35 fvcore.common.checkpoint]: Loading checkpoint from pretrained/ctw1500_attn_R_50.pth [06/15 14:38:35 adet.trainer]: Starting training from iteration 0

Yuliang-Liu commented 4 years ago

@chenyangMl I have encountered this problem before when using teriminal to train the model. But when I used vscode debug mode to train the script, this problem did not happen. You can have a try.

chenyangMl commented 4 years ago

@Yuliang-Liu I suffer same problem with teriminal and vscode. Have any idea to debug?

shuangyichen commented 4 years ago

@chenyangMl Could you plz tell me the commands to run the training process and where to specify the path to train.json and training data and txt-format annotatons? Thank you!

chenyangMl commented 4 years ago

@shuangyichen @Yuliang-Liu Running train_net.py use command "OMP_NUM_THREADS=1 python tools/train_net.py --config-file configs/BAText/TotalText/attn_R_50.yaml --num-gpus 1"

dataset arch: datasets

mydataset
- annotations
  - train.json
- train_img
  - img_1.jpg
  - img_2.jpg

specify train img and annotations in "builtin.py": "mydataset_train":("mydataset/train_img","mydataset/annotations/train.json")

specify train config in "configs/BAText/TotalText/Base-TotalText.yaml" DATASETS: TRAIN: ("mydataset_train",) TEST: ("mydataset_train",)

one txt-format annotatons of dataset looks like: 24.49,22.09,231.04,18.89,229.73,18.78,436.7,16.86,436.12,68.6,230.59,72.38,230.59,72.38,25.07,76.16||||text 25.07,76.16,284.38,73.13,282.66,72.68,542.51,70.35,543.67,117.44,284.95,120.35,284.95,120.35,26.23,123.26||||text 25.98,121.84,282.61,119.3,280.6,118.45,537.86,116.86,539.6,157.56,283.79,160.76,283.79,160.76,27.98,163.95||||text 29.72,168.02,285.94,164.15,284.76,163.48,541.35,161.05,543.09,202.33,286.41,205.52,286.41,205.52,29.72,208.72||||text 27.98,213.95,287.74,210.62,285.56,209.03,546.0,206.98,546.58,248.26,286.99,254.07,286.99,254.07,27.4,259.88||||text 27.4,265.7,282.33,260.69,280.67,259.16,536.12,254.65,537.28,297.09,282.34,300.58,282.34,300.58,27.4,304.07||||text 25.65,303.49,285.68,303.95,283.59,301.72,544.26,303.49,544.26,344.77,284.95,344.77,284.95,344.77,25.65,344.77||||text 18.67,406.4,75.54,411.42,73.47,400.16,130.88,406.4,130.88,448.84,74.78,448.84,74.78,448.84,18.67,448.84||||text 111.12,484.88,332.01,484.72,330.61,482.68,551.81,484.88,551.81,536.05,331.47,536.05,331.47,536.05,111.12,536.05||||text 23.3,356.4,199.56,348.83,376.0,348.73,552.37,347.67,553.53,390.12,376.6,393.01,199.63,392.93,22.72,397.09||||text

And i found the problem when program run to line-102 of "tool/train_net.py", But it cannot run next step. looks like stack in loop.

shuangyichen commented 4 years ago

Thank you so much!

shuangyichen commented 4 years ago

I tried as you mentioned. But met this problem. have you got any idea? Thank you!

[06/16 08:29:02 adet.data.detection_utils]: TransformGens used in training: [ResizeShortestEdge(short_edge_length=(640, 672, 704, 736, 768, 800, 832, 864, 896), max_size=1600, sample_style='choice')] Traceback (most recent call last): File "/root/detectron2/detectron2/data/catalog.py", line 55, in get f = DatasetCatalog._REGISTERED[name] KeyError: 'mydataset_train'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "tools/train_net.py", line 243, in args=(args,), File "/root/detectron2/detectron2/engine/launch.py", line 57, in launch main_func(*args) File "tools/train_net.py", line 225, in main trainer = Trainer(cfg) File "tools/train_net.py", line 62, in init data_loader = self.build_train_loader(cfg) File "tools/train_net.py", line 128, in build_train_loader return build_detection_train_loader(cfg, mapper) File "/root/detectron2/detectron2/data/build.py", line 333, in build_detection_train_loader proposal_files=cfg.DATASETS.PROPOSAL_FILES_TRAIN if cfg.MODEL.LOAD_PROPOSALS else None, File "/root/detectron2/detectron2/data/build.py", line 224, in get_detection_dataset_dicts dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names] File "/root/detectron2/detectron2/data/build.py", line 224, in dataset_dicts = [DatasetCatalog.get(dataset_name) for dataset_name in dataset_names] File "/root/detectron2/detectron2/data/catalog.py", line 59, in get name, ", ".join(DatasetCatalog._REGISTERED.keys()) KeyError: "Dataset 'mydataset_train' is not registered! Available datasets are: coco_2014_train, coco_2014_val, coco_2014_minival, coco_2014_minival_100, coco_2014_valminusminival, coco_2017_train, coco_2017_val, coco_2017_test, coco_2017_test-dev, coco_2017_val_100, keypoints_coco_2014_train, keypoints_coco_2014_val, keypoints_coco_2014_minival, keypoints_coco_2014_valminusminival, keypoints_coco_2014_minival_100, keypoints_coco_2017_train, keypoints_coco_2017_val, keypoints_coco_2017_val_100, coco_2017_train_panoptic_separated, coco_2017_train_panoptic_stuffonly, coco_2017_val_panoptic_separated, coco_2017_val_panoptic_stuffonly, coco_2017_val_100_panoptic_separated, coco_2017_val_100_panoptic_stuffonly, lvis_v0.5_train, lvis_v0.5_val, lvis_v0.5_val_rand_100, lvis_v0.5_test, lvis_v0.5_train_cocofied, lvis_v0.5_val_cocofied, cityscapes_fine_instance_seg_train, cityscapes_fine_sem_seg_train, cityscapes_fine_instance_seg_val, cityscapes_fine_sem_seg_val, cityscapes_fine_instance_seg_test, cityscapes_fine_sem_seg_test, voc_2007_trainval, voc_2007_train, voc_2007_val, voc_2007_test, voc_2012_trainval, voc_2012_train, voc_2012_val, pic_person_train, pic_person_val, totaltext_train, totaltext_val, ctw1500_word_train, ctw1500_word_test, syntext1_train, syntext2_train, mltbezier_word_train"

chenyangMl commented 4 years ago

@shuangyichen You have to specify your "mydataset_train" key in exist dict(_PREDEFINED_SPLITS_TEXT) of "adet/data/builtin.py".

it looks like this: _PREDEFINED_SPLITS_TEXT = { "syntext1_train": ("syntext1/images", "syntext1/annotations/train.json"), "syntext2_train": ("syntext2/images", "syntext2/annotations/train.json"), "mltbezier_word_train": ("mlt2017/images","mlt2017/annotations/train.json"), "mydataset_train":("mydataset/train_img","mydataset/annotations/train.json") }

shuangyichen commented 4 years ago

Succeed! Thank you! I used Xshell to train. And training message was like

[06/16 11:08:59 d2.utils.events]: eta: 1:05:34 iter: 19 total_loss: 6.228 rec_loss: 2.394 loss_fcos_cls: 0.734 loss_fcos_loc: 0.317 loss_fcos_ctr: 0.670 loss_fcos_bezier: 2.121 time: 0.8739 data_time: 0.0279 lr: 0.000020 max_mem: 13278M [06/16 11:09:16 d2.utils.events]: eta: 1:06:17 iter: 39 total_loss: 4.955 rec_loss: 1.581 loss_fcos_cls: 0.373 loss_fcos_loc: 0.277 loss_fcos_ctr: 0.662 loss_fcos_bezier: 2.060 time: 0.8595 data_time: 0.0128 lr: 0.000040 max_mem: 13278M [06/16 11:09:33 d2.utils.events]: eta: 1:06:36 iter: 59 total_loss: 4.576 rec_loss: 1.397 loss_fcos_cls: 0.267 loss_fcos_loc: 0.263 loss_fcos_ctr: 0.649 loss_fcos_bezier: 1.938 time: 0.8549 data_time: 0.0128 lr: 0.000060 max_mem: 13278M

Hope this could help you.

Yuliang-Liu commented 4 years ago

@chenyangMl Have you solved your problem?

chenyangMl commented 4 years ago

@Yuliang-Liu Hi, I have found the problem where the program stack in dead loop at line 43~46 of "adet/data/detection_utils.py". Now, i set "crop_box" always True to keep training. But i am worried that it will affect the training results. Wish you can help me.

if not crop_box:
    modified = True
    while modified:
        modified, x0, y0, crop_size = adjust_crop(x0, y0, crop_size, instances)

stan-haochen commented 4 years ago

@chenyangMl this problem is caused by floating point bbox coords, which should have been fixed with this PR: https://github.com/aim-uofa/AdelaiDet/pull/104.

If you find other problems, please feel free to ask.

Lincoln93 commented 4 years ago

@chenyangMl can you please inform me how you generate annotation for you custom data set. Have you store it like total-text format in txt such as:: x,y x,y ...label? If so, could u please inform me how to convert txt tojson like coco formatfor ABCNet? The example scripts showed conversion usingXML, but I don't know how to convert withtxt?

jiangzz1628 commented 4 years ago

@chenyangMl，请您告诉我您如何为自定义数据集生成注释。您是否以全文本格式存储它，txt例如：x，yx，y ... label？如果是这样，您能告诉我如何转换txt to可可格式for ABCNet? The example scripts showed conversion usingXML , but I don't know how to convert withtxt之类的json 吗？

您好，请问您实现数据集的标注了吗？能否告知一下，如何操作？

aim-uofa / AdelaiDet

Problem of training ABCNet #100