lipreading_stn

Lipreading in natural scenes with 4-digit random number(0000-9999)

Pretrain models on Grid and then finetune on VSA dataset.

To pretrain models, run:

python pytorchtransformer.py

To finetune models, run:

python vsatransformer.py

spatial transformer network (STN) can be turned on while stn_on=True

goldfish22 / lipreading-stn