FangShancheng / ABINet

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition
Other
420 stars 72 forks source link

Reproduce training issue #107

Closed htuannn closed 5 months ago

htuannn commented 6 months ago

Hi @FangShancheng,

I am trying to reproduce the training code without using FastAI.learner I use pretrained best-train-abinet.pth released version with the corresponding config file. And it's my reproduce training code:

` criterion = MultiLosses(one_hot= True, device= device) .... images_source_tensor, labels_source = next(source_loader_iter) images_source = images_source_tensor.to(device)

    labels_source_index, labels_source_length= [], []
    for label_source in labels_source:
        # post-process label
        label_source_length = torch.tensor([len(label_source)+ 1])
        label_source_index_ = torch.tensor(charset.get_labels(label_source))
        label_source_index = onehot(label_source_index_, charset.num_classes)

        labels_source_index.append(label_source_index)
        labels_source_length.append(label_source_length)

    labels_source_index = torch.stack(labels_source_index).to(device)
    labels_source_length = torch.cat(labels_source_length).to(device)

    preds_source = model(images_source)

    loss_source = criterion(
        preds_source, labels_source_index, label_source_length
    )
    model.zero_grad(set_to_none=True)
    loss_source.backward()
    torch.nn.utils.clip_grad_norm_(
        model.parameters(), opt.grad_clip
    )   # gradient clipping with 5 (Default)
    optimizer.step()

` I used the MJ + ST data set, to test training with the above code, however, the evaluation results every 500 iterations on benchmark sets decreased relatively negatively. I don't know if my reimplementation have any thing wrong.

Thanks.

KhaLee2307 commented 6 months ago

I have the same issue.

htuannn commented 5 months ago

Now my issue is solved. I realized I accidentally set ignore_index for null_char in criterion, it causes the model to fail to converge for the stop of recognition string.