aim-uofa / AdelaiDet

AdelaiDet is an open source toolbox for multiple instance-level detection and recognition tasks.
https://git.io/AdelaiDet
Other
3.36k stars 643 forks source link

ABCNet: Transposing Images and Log Messages! #139

Closed innat closed 4 years ago

innat commented 4 years ago

Hi,

I was training a custom data set using ABCNet. The images are in random shape, I mean horizontal, vertical, etc. And maybe so that reason, while start training, I got the following:

transposing image datasets/images/IMG_1.JPG
transposing image datasets/images/IMG_2.JPG
transposing image datasets/images/IMG_3.JPG
transposing image datasets/images/IMG_4.JPG
transposing image datasets/images/IMG_5.JPG
[07/06 08:59:35 d2.utils.events]:  eta: 1:18:16  iter: 59  total_loss: 5.080  rec_loss: 1.508  loss_fcos_cls: 0.591  loss_fcos_loc: 0.556  loss_fcos_ctr: 0.649  loss_fcos_bezier: 1.466  time: 1.0022  data_time: 0.0081  lr: 0.000599  max_mem: 6881M
transposing image datasets/images/IMG_6.JPG
transposing image datasets/images/IMG_7.JPG
transposing image datasets/images/IMG_8.JPG
transposing image datasets/images/IMG_9.JPG
transposing image datasets/images/IMG_10.JPG
transposing image datasets/images/IMG_11.JPG
[07/06 08:59:35 d2.utils.events]:  eta: 1:18:16  iter: 59  total_loss: 5.080  rec_loss: 1.508  loss_fcos_cls: 0.591  loss_fcos_loc: 0.556  loss_fcos_ctr: 0.649  loss_fcos_bezier: 1.466  time: 1.0022  data_time: 0.0081  lr: 0.000599  max_mem: 6881M

Is it normal? I mean, why printing the log messages like this way! Is the program, auto-rotate the samples for its suitable input, transposing?

Issue 2

Being said that, the log messages:

total_loss
rec_loss
loss_fcos_cls
loss_fcos_loc
loss_fcos_ctr
loss_fcos_bezier

I understand the first one, recognition loss, total loss simply both recognition loss, and detection loss. What about others, and name convention (fcos)?

Issue 3

And suddenly after training sometimes, the following message appears

AssertionError: The annotation bounding box is outside of the image!

But I've checked bezier_viz output, it looks good to me. What I've missed?

Issue 4

And while evaluating the model, as demonstrate here, the following files,

|_ evaluation
|  |_ gt_totaltext.zip
|  |_ gt_ctw1500.zip

in this case, are those containing polygonal annotations or bezier annotations, I mean if we unzip the above files, we txt format annotation file? When I evaluate, I get following

"E2E_RESULTS: 
precision: 0.5654166666666667, 
recall: 0.5048363095238095, 
hmean: 0.5334119496855346"

"DETECTION_ONLY_RESULTS: 
precision: 0.88125, 
recall: 0.7868303571428571, 
hmean: 0.8313679245283019"

I think I've missed, but is there any built-in function to plot the prediction on the samples? If so, can you please refer to the code ref, please?

To be honest, for me, the evaluation of scene text recognition is quite complex than usual. Would you please refer to some document that demonstrates scene-text-recognition evaluation protocols. An especially exact match of the word for evaluating text recognition, considering punctuation, etc.

Issue 5

And how the ABCNet split the data set for the training and validation/testing part? In builtin.py files, we set as follows:

_PREDEFINED_SPLITS_TEXT = {
    "ctw1500_word_train": ("CTW1500/ctwtrain_train_image", "CTW1500/annotations/train.json"),
    "ctw1500_word_test": ("CTW1500/ctwtest_text_image","CTW1500/annotations/test.json"),
}

So, do we need to manually split our train set and test set? And is this test set, were you using it for validation or simply test phase? OR, you trained synthetic samples and fine-tune on the train set of total-text/ctw1500 and lastly evaluate on test set of total-text/ctw1500?

Eurus-Holmes commented 4 years ago

@innat Hi, I was also training ABCNet with the custom dataset, maybe I could answer some questions for you.

Issue 1

It should be output results from here.

Issue 2

You can refer the code at here. The calculation process is at here.


Plus, I encountered a strange data loss problem during training, have you encountered it?

innat commented 4 years ago

@Eurus-Holmes Thank you. That's really helpful. And about the problem you've faced, I haven't faced it yet as I didn't make the inference after training on custom data set yet. I will check though and if I get any way out, will inform here for sure. =)

Yuliang-Liu commented 4 years ago

@innat

Issue 3: could be something wrong on your custom dataset. Can you print those exceptional bounding box and see why it happens?

Issue 4: polygonal annotations. For is there any built-in function to plot the prediction on the samples?, I think you have to refer to the code yourself. You can see text_eval_script.py. For documents, I recommend you refer to the ICDAR 2015 official evaluation code.

Issue 5: We didn't provide validation set. Please do not use it for the validation purpose.

OR, you trained synthetic samples and fine-tune on the train set of total-text/ctw1500 and lastly evaluate on test set of total-text/ctw1500?

That's right.

innat commented 4 years ago

@Yuliang-Liu Thank you. Yesterday I was reading this ICDAR evaluation protocol. Awesome, thanks for sharing. 🙂

innat commented 4 years ago

@Yuliang-Liu

while converting the polygonal annotation to bezier annotation I got something following

/content/drive/My Drive/Toyota/ABCNet_Custom/Bezier_generator2_txt.py:146: RuntimeWarning: divide by zero encountered in double_scalars
  st_slope = (ys[-1] - ys[0])/(xs[-1] - xs[0])
/content/drive/My Drive/Toyota/ABCNet_Custom/Bezier_generator2_txt.py:149: RuntimeWarning: invalid value encountered in subtract
  diffs = abs(slopes - st_slope)

I visually checked the polygonal box on the samples after creating the polygonal annotations, it was good. Then when I use the conversion scripts for bezier annotation, I got this error, but the program didn't stop and after processing all, I got bezier visualization files. I then checked the visual outcomes of bezier annotation on the samples. Any catch?

I think this is the reason for Issue 3, AssertionError: The annotation bounding box is outside of the image! while training. I've used CTW-1500 annotation tools. However, the assertion error raised I think for the applied augmentation.

Update

I get rid of the RuntimeWarning error above for now. But still facing this assertion error.

Another RuntimeError:

  File "/home/jupyter/AdelaiDet/adet/modeling/roi_heads/attn_predictor.py", line 24, in forward
    output = output.view(T, b, -1)
RuntimeError: cannot reshape tensor of 0 elements into shape [32, 0, -1] because the unspecified dimension size -1 can be any value and is ambiguous
innat commented 4 years ago

@Yuliang-Liu While inferencing on custom data set, I got the following error. I set training set synthetic data and inference on the custom data set.

detectron2.data.detection_utils.SizeMismatchError: Mismatched (W,H) for image datasets/toyota/images/IMG_0009.JPG, got (3264, 2448), expect (2448, 3264)

Any catch? Checked. and this one.

Also, When I train on the custom data set, it was transposed but in inference time, I think it's unable to handle. I've read this article, however, I've used the ctw-1500 annotations tool, can there be any issue of reading EXIF information?

innat commented 4 years ago

@Yuliang-Liu I'm getting the following error, on the custom data set, I've annotated the data set using the CTW-1500 tools,

if I set the custom data set as training:

  File "/home/jupyter/AdelaiDet/adet/modeling/roi_heads/attn_predictor.py", line 24, in forward
    output = output.view(T, b, -1)
RuntimeError: cannot reshape tensor of 0 elements into shape [32, 0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

if I set the custom data set as inference:

detectron2.data.detection_utils.SizeMismatchError: Mismatched (W,H) for image datasets/toyota/images/IMG_0009.JPG, got (3264, 2448), expect (2448, 3264)

I think it might cause because of the annotation files, but I've checked the files by plotting on the raw image. However, if I try to perform training on the custom data set, sometimes the following error also happen but sometimes not; reasonable as it's maybe due to the augmentation:

AssertionErrro: Bounding box out of the image

@Eurus-Holmes have you faced anything like this? And just to know, for your custom data, do you need to annotate it by yourself, and if so, have you used CTW-1500 tools or something else.

Eurus-Holmes commented 4 years ago

@Yuliang-Liu I'm getting the following error, on the custom data set, I've annotated the data set using the CTW-1500 tools,

if I set the custom data set as training:

  File "/home/jupyter/AdelaiDet/adet/modeling/roi_heads/attn_predictor.py", line 24, in forward
    output = output.view(T, b, -1)
RuntimeError: cannot reshape tensor of 0 elements into shape [32, 0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

if I set the custom data set as inference:

detectron2.data.detection_utils.SizeMismatchError: Mismatched (W,H) for image datasets/toyota/images/IMG_0009.JPG, got (3264, 2448), expect (2448, 3264)

I think it might cause because of the annotation files, but I've checked the files by plotting on the raw image. However, if I try to perform training on the custom data set, sometimes the following error also happen but sometimes not; reasonable as it's maybe due to the augmentation:

AssertionErrro: Bounding box out of the image

@Eurus-Holmes have you faced anything like this? And just to know, for your custom data, do you need to annotate it by yourself, and if so, have you used CTW-1500 tools or something else.

@innat I have not faced this problem. But it seems that you processed your images size by mistake, first, you could test it with some small samples data and find this problem is from the image itself or processing code.

innat commented 4 years ago

@Eurus-Holmes thanks. I've checked though. I've used CTW-1500 annotation tools for annotations. However, I think there can be some issue in this annotation tool about loading the image meta information such as EXIF, more specifically the orientation. @Yuliang-Liu, can you please confirm this?

anruirui commented 3 years ago

@innat I face the Issue 3 on custom datasets, can you give me some help to solve it?

innat commented 3 years ago

@anruirui I didn't continue with ABCNet after that as we continued with MaskTextSpotterV3

Yuliang-Liu commented 3 years ago

@innat Do you have any result comparing these two methods. Our recent study shows that using same datasets and traning strategies, our can outperform MTSv3. See paper. image

anruirui commented 3 years ago

@innat Do you have any result comparing these two methods. Our recent study shows that using same datasets and traning strategies, our can outperform MTSv3. See paper. image

@Yuliang-Liu Hello, I want to know what kind of data(Chinese or English) used on your recent study? I trained on ICDAR 2017 RCTW datasets(Chinese and English),but after some iters, there is an error: "connot convert float NaN to integer", which is caused in /detectron2/data/transforms/augmentation.py. /home/luban/miniconda3/envs/adet/lib/python3.7/site-packages/detectron2/data/transforms/augmentation_impl.py:161: RuntimeWarning: divide by zero encountered in double_scalars scale = size 1.0 / min(h, w) /home/luban/miniconda3/envs/adet/lib/python3.7/site-packages/detectron2/data/transforms/augmentation_impl.py:168: RuntimeWarning: invalid value encountered in double_scalars newh = newh scale newh = int(newh + 0.5) ValueError: cannot convert float NaN to integer

innat commented 3 years ago

@Yuliang-Liu no, I didn't check your recent update. I will check, thanks for sharing.

ryouchinsa commented 1 year ago

Although this is not an open source program, RectLabel has "Remove EXIF orientation flags" feature on File menu. After opening images, we recommend to remove EXIF orientation flags. If necessary, you can rotate the image by 90 degrees using "Rotate the image right/left" feature on Edit menu. https://rectlabel.com/help#remove_exif_orientation_flags

LANDDKPLA commented 4 months ago

@innat I face the Issue 3 on custom datasets, can you give me some help to solve it?

        with Image.open(img_fpth) as pil_img:
            transposed_img = ImageOps.exif_transpose(pil_img)
            new_img = None
            if pil_img.size != transposed_img.size:
                bak_save_pth = os.path.join(("../path_to_back_up_dirdectory"), image["file_name"])
                # print(f"back up image saved. {bak_save_pth}")
                transposed_img.save(bak_save_pth)
                new_img = pil_img.copy()
        if new_img:
            os.remove(img_fpth)
            new_img.save(img_fpth)

It works for me. Transpose your original image then save it. The reason why this issue occured is not clear, but you we can do some work to our data. You can check this link for more details.