SignCL
is a PyTorch module designed to enhance sign language translation models by encouraging the learning of more discriminative feature representations. It brings the visual representations of sign gestures with identical semantics closer together while pushing those with different semantics farther apart through contrastive learning. This module can be integrated into both the pretraining and finetuning stages of a sign language translation model. Experiments demonstrate that SignCL
can significantly reduce representation density and improve performance across various translation frameworks.
We consistently observed a negative relationship between representation density and performance. Specifically, an increase in the representation density (+26%) can result in a performance drop (-39%) in BLEU score.
To use SignCL
, ensure you have the following dependencies installed:
Checkpoints can be found in here
Here's a step-by-step guide to integrating SignCL into your sign language translation model.
0. cl_criterion = SignCL()
1. frames_feature = model.encoder(src_input)
2. margin = min(20, max(10, int(num_frames // text_length * 2.3)))
3. cl_loss = cl_criterion(frames_feature, margin=margin)
4. total_loss = lambda_ * cl_loss + original_loss
This example code was modified from GFSLT-VLP GitHub. Please refer to their homepage to set up the environment and dataset.
To execute, use the following command:
bash examples/scripts.sh
This script will execute the training and evaluation process, demonstrating how to integrate the SignCL
loss function into the GFSLT-VLP framework. We also included our self-reproduced results
and log.txt
on the CSL-Daily dataset (see link).
Note if you find this code work for your research, please cite the following paper:
@inproceedings{ye2024improving,
title={Improving Gloss-free Sign Language Translation by Reducing Representation Density},
author={Ye, Jinhui and Wang, Xing and Jiao, Wenxiang and Liang, Junwei and Xiong, Hui},
journal={arXiv preprint arXiv:2405.14312},
year={2024}
}