Open yan159yan opened 2 years ago
Hi @yan159yan: Thank you for your interest in our work.
For the preprocess_data.py
we use it to run the preprocessing before running evaluation on the eval.py
.
As an example, for running the evaluation for the dataset/mm_test_metadata.csv
using the pretrained Wav2Vec model CAiRE/wav2vec2-large-xlsr-53-cantonese
, you can run the preprocessing and the evaluation in this way:
python preprocess_data.py \
--output_dir=<CACHE_DIR_PATH>\
--model_name_or_path=CAiRE/wav2vec2-large-xlsr-53-cantonese \
--test_manifest_path=dataset/mm_test_metadata_noisy.csv \
--preprocessing_num_workers=32 \
--seed=0 --use_video \
--audio_column_name=audio_path \
--text_column_name=text_path \
--video_column_name=lip_image_path
python eval.py \
--output_dir=<OUTPUT_DIR_PATH> \
--model_name_or_path=CAiRE/wav2vec2-large-xlsr-53-cantonese \
--test_manifest_path=<CACHE_DIR_PATH>/preprocess_data.arrow \
--num_workers=8 \
--preprocessing_num_workers=8 \
--use_video \
--audio_column_name=audio_path \
--text_column_name=text_path \
--video_column_name=lip_image_path \
--per_device_eval_batch_size=16 \
--dataloader_num_workers=32 \
--seed=0 \
--logging_strategy=steps \
--logging_steps=10 \
--report_to=tensorboard \
--evaluation_strategy=epoch \
--eval_steps=1 \
--eval_accumulation_steps=100
Note that --use_video
is used to also include the the lip image data. If you don't need the visual part, you can remove that argument.
Hope it helps!
Good work for the visual-audio data. is there any parameter configuration for the "preprocess_data.py"?