artbataev / end2end

Losses and decoders for end-to-end ASR and OCR
https://artbataev.github.io/end2end/
MIT License
33 stars 5 forks source link

clarification : gram ctc - alphabet_size ? "a","b" or "ab" single logits output? #1

Open danFromTelAviv opened 5 years ago

danFromTelAviv commented 5 years ago

Thank you very much for your implementation of ctc varients. To be frank I think that is the main value of this repo and I would change its name to pytorch ctc varients or something of the like because it is very very hard to find these great implementations you made. OCR, however, is pretty prevalent.

Just to clarify - for gram ctc the logits should represent single characters such as "a" and "b" or grams such as "ab" ?

and just to validate - this is an implementation of this: https://arxiv.org/pdf/1703.00096.pdf right?

Thanks, Dan

danFromTelAviv commented 5 years ago

from reading the code it does look like this is actually "gram-ctc" but the test doesn't run... It's missing the mandatory input grams.

based on : max_gram_length = len(grams.shape) ; if max_gram_length >= 4: raise NotImplementedError # num_basic_labels = grams.shape[0]

should it be a tensor of shape [(alphabet size+1) x (alphabet size+1) x (alphabet size+1)] for grams of max size 3?

artbataev commented 5 years ago

I'm sorry, Gram-CTC is not yet implemented, but it is first priority future task: https://github.com/artbataev/end2end#future-plans, and I'm working on it. For now only CTC-Loss and CTC Beam Search Decoder with language model are working https://artbataev.github.io/end2end/pytorch_end2end.html

danFromTelAviv commented 5 years ago

ok. thank you for your work. good luck !