关于baseline和alignedlip的疑问

VIPL-Audio-Visual-Speech-Understanding / learn-an-effective-lip-reading-model-without-pains

The PyTorch Code and Model In "Learn an Effective Lip Reading Model without Pains", (https://arxiv.org/abs/2011.07557), which reaches the state-of-art performance in LRW-1000 dataset.

152 stars 37 forks source link

关于baseline和alignedlip的疑问 #3

Closed shibefore closed 3 years ago

shibefore commented 3 years ago

你好，在阅读论文和代码后有以下两个疑问。 1.论文里Baseline采用的是3d+resnet18+3 layers GRU。在LRW1000上指标是46.5 之前论文类似的方法只能到38.7，如table5中引用29提到的《Mutual information maximization for effective lip reading》我想问下这个提升怎么做到的？

2.table3中alignedLip是指做了嘴部对齐的，但是在LRW-1000上没有看到相关指标。想问下嘴唇对齐在LRW-1000上能提高多少？

Fengdalu commented 3 years ago

数据预处理时候，我们用了类似LRW数据的词居中padding的方式，这样可以提供更多的上下文信息，实际上有比较大提升
LRW-1000在数据发布时已经是对齐好的数据