Open hshreeshail opened 1 year ago
2. All existing papers (afaik) that benchmark on Phoenix14-T AND have pretraining steps requiring an isolated sign dataset perform the pretraining on the WLASL (American Sign Language) dataset (Ex: in this, see subsection: Progressive Pretraining under Section 3.1). You could do the same or skip that pretraining step.
The other paper you mentioned didn't pre-train on any isolated sign data. The way we use WLASL is constructing a sign classification dataset via sign spotting rather than pre-training only, thus requiring the dictionary of the isolated signing and continuous signing datasets to match. This is a component of our model and discarding it will downgrade performance.
Why does the paper not have comparison against recent SoTA benchmarks (like this and this)? Also, why are there no results of the proposed method on the well-known Phoenix14-T benchmark?