FactoDeepLearning / DAN

Other
79 stars 15 forks source link

general questions #1

Closed seekingdeep closed 2 years ago

seekingdeep commented 2 years ago

@FactoDeepLearning Hi there,

i had a look at the published paper, and correct me if i am wrong:

FactoDeepLearning commented 2 years ago

Hi,

FactoDeepLearning commented 2 years ago

It seems to be a really interesting work. At first sight, as for the other self-supervised learning (SSL) approaches, my worries are about the nature of the task.

SSL is mainly carried out for image classification. This way, the model can learn object representations through transformation techniques since there is globally one interesting object per sample. For HTR, a single example contains multiple characters, and each one must be recognized. I think it could be confusing for the representation learning of each character.

FactoDeepLearning commented 2 years ago

1) Training samples must be as various as possible in terms of content (sequence of characters) and/or layout if the unseen data have unconstrained layout. The reading order must be consistent through all the training samples to really learn what it means to read a document

2) I only tested the model on the two HTR datasets presented in the paper.

FactoDeepLearning commented 2 years ago

I think your idea is close to this work.

So yes, it should work too.

seekingdeep commented 2 years ago

have a good day.