Open CCTN-BCI opened 9 months ago
Which checkpoint did you load? These commands should work for pre-trained checkpoints. For fine-tuned checkpoints, please refer to the colab notebook in the repo.
Which checkpoint did you load? These commands should work for pre-trained checkpoints. For fine-tuned checkpoints, please refer to the colab notebook in the repo.
I load the exact model in the colab notebook named "https://dl.fbaipublicfiles.com/avhubert/model/lrs3_vox/vsr/base_vox_433h.pt". I met no errors before and at "extract mouth ROI."
Note the checkpoint you list is fine-tuned and thus shouldn't be used in the python command as pasted.
I'm facing the same issue and don't see the solution by following the colab notebook. I am trying to load the fine-tuned av-hubert module in my own project. Installed it to my docker image. Unsure why the problem shows up when used this way vs the colab notebook - I've copied the entire code block and still get the above error.
import cv2
import tempfile
from argparse import Namespace
import fairseq
from fairseq import checkpoint_utils, options, tasks, utils
from fairseq.dataclass.configs import GenerationConfig
from IPython.display import HTML
def predict(video_path, ckpt_path, user_dir):
num_frames = int(cv2.VideoCapture(video_path).get(cv2.CAP_PROP_FRAME_COUNT))
data_dir = tempfile.mkdtemp()
tsv_cont = ["/\n", f"test-0\t{video_path}\t{None}\t{num_frames}\t{int(16_000*num_frames/25)}\n"]
label_cont = ["DUMMY\n"]
with open(f"{data_dir}/test.tsv", "w") as fo:
fo.write("".join(tsv_cont))
with open(f"{data_dir}/test.wrd", "w") as fo:
fo.write("".join(label_cont))
utils.import_user_module(Namespace(user_dir=user_dir))
modalities = ["video"]
gen_subset = "test"
gen_cfg = GenerationConfig(beam=20)
models, saved_cfg, task = checkpoint_utils.load_model_ensemble_and_task([ckpt_path])
models = [model.eval().cuda() for model in models]
saved_cfg.task.modalities = modalities
saved_cfg.task.data = data_dir
saved_cfg.task.label_dir = data_dir
task = tasks.setup_task(saved_cfg.task)
task.load_dataset(gen_subset, task_cfg=saved_cfg.task)
generator = task.build_generator(models, gen_cfg)
def decode_fn(x):
dictionary = task.target_dictionary
symbols_ignore = generator.symbols_to_strip_from_output
symbols_ignore.add(dictionary.pad())
return task.datasets[gen_subset].label_processors[0].decode(x, symbols_ignore)
itr = task.get_batch_iterator(dataset=task.dataset(gen_subset)).next_epoch_itr(shuffle=False)
sample = next(itr)
sample = utils.move_to_cuda(sample)
hypos = task.inference_step(generator, models, sample)
ref = decode_fn(sample['target'][0].int().cpu())
hypo = hypos[0][0]['tokens'].int().cpu()
hypo = decode_fn(hypo)
return hypo
mouth_roi_path, ckpt_path = "/content/data/roi.mp4", "/content/data/finetune-model.pt"
user_dir = "/content/av_hubert/avhubert"
hypo = predict(mouth_roi_path, ckpt_path, user_dir)
Update: Looks like this error is occurring even if I run from av-hubert repo directly. As well as using the pretrained checkpoint instead of fine-tuned checkpoint (e.g. https://dl.fbaipublicfiles.com/avhubert/model/lrs3_vox/clean-pretrain/large_vox_iter5.pt
) however it's a slightly different key.
omegaconf.errors.ConfigAttributeError: Key 'required_seq_len_multiple' not in 'AVHubertConfig'
full_key: required_seq_len_multiple
reference_type=Optional[AVHubertConfig]
object_type=AVHubertConfig
I'm facing the same issue and don't see the solution by following the colab notebook. I am trying to load the fine-tuned av-hubert module in my own project. Installed it to my docker image. Unsure why the problem shows up when used this way vs the colab notebook - I've copied the entire code block and still get the above error.
import cv2 import tempfile from argparse import Namespace import fairseq from fairseq import checkpoint_utils, options, tasks, utils from fairseq.dataclass.configs import GenerationConfig from IPython.display import HTML def predict(video_path, ckpt_path, user_dir): num_frames = int(cv2.VideoCapture(video_path).get(cv2.CAP_PROP_FRAME_COUNT)) data_dir = tempfile.mkdtemp() tsv_cont = ["/\n", f"test-0\t{video_path}\t{None}\t{num_frames}\t{int(16_000*num_frames/25)}\n"] label_cont = ["DUMMY\n"] with open(f"{data_dir}/test.tsv", "w") as fo: fo.write("".join(tsv_cont)) with open(f"{data_dir}/test.wrd", "w") as fo: fo.write("".join(label_cont)) utils.import_user_module(Namespace(user_dir=user_dir)) modalities = ["video"] gen_subset = "test" gen_cfg = GenerationConfig(beam=20) models, saved_cfg, task = checkpoint_utils.load_model_ensemble_and_task([ckpt_path]) models = [model.eval().cuda() for model in models] saved_cfg.task.modalities = modalities saved_cfg.task.data = data_dir saved_cfg.task.label_dir = data_dir task = tasks.setup_task(saved_cfg.task) task.load_dataset(gen_subset, task_cfg=saved_cfg.task) generator = task.build_generator(models, gen_cfg) def decode_fn(x): dictionary = task.target_dictionary symbols_ignore = generator.symbols_to_strip_from_output symbols_ignore.add(dictionary.pad()) return task.datasets[gen_subset].label_processors[0].decode(x, symbols_ignore) itr = task.get_batch_iterator(dataset=task.dataset(gen_subset)).next_epoch_itr(shuffle=False) sample = next(itr) sample = utils.move_to_cuda(sample) hypos = task.inference_step(generator, models, sample) ref = decode_fn(sample['target'][0].int().cpu()) hypo = hypos[0][0]['tokens'].int().cpu() hypo = decode_fn(hypo) return hypo mouth_roi_path, ckpt_path = "/content/data/roi.mp4", "/content/data/finetune-model.pt" user_dir = "/content/av_hubert/avhubert" hypo = predict(mouth_roi_path, ckpt_path, user_dir)
Update: Looks like this error is occurring even if I run from av-hubert repo directly. As well as using the pretrained checkpoint instead of fine-tuned checkpoint (e.g.
https://dl.fbaipublicfiles.com/avhubert/model/lrs3_vox/clean-pretrain/large_vox_iter5.pt
) however it's a slightly different key.omegaconf.errors.ConfigAttributeError: Key 'required_seq_len_multiple' not in 'AVHubertConfig' full_key: required_seq_len_multiple reference_type=Optional[AVHubertConfig] object_type=AVHubertConfig
Haven't checked the pasted code block yet but the colab notebook runs fine for me.
Solution:
pip install numpy==1.23.5
pip install git+https://github.com/facebookresearch/fairseq.git@afc77bd#egg=fairseq
I follow the words on readme.md as follows to load a pre-trained model:
$ cd avhubert $ python import fairseq import hubert_pretraining, hubert ckpt_path = "/path/to/the/checkpoint.pt" models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path]) model = models[0]
The error is as follows: omegaconf.errors.ConfigKeyError: Key 'input_modality' not in 'AVHubertPretrainingConfig' full_key: input_modality reference_type=Optional[AVHubertPretrainingConfig] object_type=AVHubertPretrainingConfig
How should I remove the argument 'input_modality' (or other necessary stages)? Thank you very much!
I met these problems in a new-installed Ubuntu 22.04 and correctly installed fairseq.