Lipreading in natural scenes with 4-digit random number(0000-9999)
Pretrain models on Grid and then finetune on VSA dataset.
To pretrain models, run:
python pytorchtransformer.py
To finetune models, run:
python vsatransformer.py
spatial transformer network (STN) can be turned on while stn_on=True