facebookresearch / gtn_applications

Applications using the GTN library and code to reproduce experiments in "Differentiable Weighted Finite-State Transducers"
MIT License
80 stars 7 forks source link

[STC] Question for STC loss training #20

Closed LEECHOONGHO closed 2 years ago

LEECHOONGHO commented 2 years ago

Hello, I'm trying to apply STC for my ASR model training.

Before proceeding with the training, I have a question to ask about STC training mentioned in STC paper [1] If anyone has experimented with the case I posited, please give me advise.

1. Does STC valid for pDrop=0.01~0.02 data? or do I just have to use CTC?
    data - The 99% reliable data that may contain some typos or sometimes the business name is erased.

    1-1. If STC is valid for 1., Is it sufficient to set p_0=0.01, p_max=0.03 for this case?

2.  Adam is not allowed for STC/(WFST)?

3. For future work, I am thinking of using pseudo labeled YouTube data for ASR training.
    In this case, data could have much incorrect labels in it.
    Does STC perform better than CTC even in case of incorrect labeled data training?

Thank you.

[1] Star Temporal Classification: Sequence Classification with Partially Labeled Data.

vineelpratap commented 2 years ago

Hi,

  1. With 99% reliable data, STC may not improve the results greatly over a CTC baseline. You would have to experiment and see. Using low p_0 and p_max values as you suggested makes sense.

  2. We have used Adam optimizer for Handwriting Recognition results reported in the paper. I do not see any issues with using Adam.

  3. Note that STC only corrects deletion errors in the dataset. While noisy data can usually contain deletion, insertion, and substitution errors.

If you have a metric to track confidence of each word in the pseudo label, an option could be to keep only high confidence words in the pseudo label and remove the rest. This could be considered as a partial label and the STC model should help here.

Another option to deal with noisy labels is described in https://arxiv.org/abs/2010.15653 where multiple pseudo labels are used per sample. This can also be easily implemented in the GTN framework by carefully modifying the label graph of CTC.

Hope this helps !

LEECHOONGHO commented 2 years ago

Thank you for your advice!

It helped me a lot!