Open dina-adel opened 2 years ago
Same Problem
@dina-adel @AhmedEssam19 Have you tried this advice? https://github.com/facebookresearch/fairseq/issues/4597#issuecomment-1198377793
It feels like a wrong logic is implemented with w2v_path, resulting a forever loop.
So you need delete it and declare the origin w2v_args
instead. The comment tells you how to retrieve them.
@gmryu The problem is that there is a lot of confusion regarding the referenced model, the speech normalizer, and the w2v in textless_s2st. How should each of these models get passed as arguments to the inference script?
Hey, has anyone successed loading checkpoint_best.pt of testless? I am struggling to deal with it, please help me T_T... Here is what I have tried, but it is useless.
MODEL_PATH = "/data1/haoqiuyan/run_textless/en_1h/checkpoint_best.pt"
NEW_MODEL_PATH = "/data1/haoqiuyan/run_textless/en_1h/update_checkpoint_best_4.pt"
BASE_HUBERT = "/data1/haoqiuyan/run_textless/mhubert_base_vp_en_es_fr_it3.pt"
# load base hubert
hubert = torch.load(BASE_HUBERT)
hcfg = hubert["cfg"]
hcfg["task"]["normalize"] = False
hcfg["task"]["autoregressive"] = False
ref_args = convert_namespace_to_omegaconf(hcfg)
# load normalizer
model = torch.load(MODEL_PATH)
cfg = model['cfg']
cfg["model"]["w2v_path"] = None
cfg["model"]["w2v_args"] = ref_args
torch.save(model, NEW_MODEL_PATH)
And, I run the following command:
CUDA_VISIBLE_DEVICES=0 python examples/speech_recognition/new/infer.py --config-dir examples/hubert/config/decode/ \
--config-name infer_viterbi \
task.data=/data1/haoqiuyan/run_textless \
task.normalize=false \
common_eval.results_path=/data1/haoqiuyan/run_textless/result_units/log \
common_eval.path=/data1/haoqiuyan/run_textless/en_1h/update_checkpoint_best_4.pt \
dataset.gen_subset=voxpopuli \
dataset.num_workers=6 \
'+task.labels=["unit"]' \
+decoding.results_path=/data1/haoqiuyan/run_textless/result_units \
common_eval.post_process=none \
+dataset.batch_size=1 \
common_eval.quiet=True
I got failed message. Maybe there is some error of w2v_path.
File "/data1/haoqiuyan/fairseq/fairseq/checkpoint_utils.py", line 367, in load_model_ensemble
ensemble, args, _task = load_model_ensemble_and_task(
File "/data1/haoqiuyan/fairseq/fairseq/checkpoint_utils.py", line 473, in load_model_ensemble_and_task
model = task.build_model(cfg.model, from_checkpoint=True)
File "/data1/haoqiuyan/fairseq/fairseq/tasks/fairseq_task.py", line 340, in build_model
model = models.build_model(cfg, self, from_checkpoint)
File "/data1/haoqiuyan/fairseq/fairseq/models/__init__.py", line 90, in build_model
cfg = merge_with_parent(dc(), cfg, from_checkpoint)
File "/data1/haoqiuyan/fairseq/fairseq/dataclass/utils.py", line 500, in merge_with_parent
merged_cfg = OmegaConf.merge(dc, cfg)
omegaconf.errors.ValidationError: Non optional field cannot be assigned None
full_key: w2v_path
reference_type=Optional[HubertCtcConfig]
object_type=HubertCtcConfig
I have solve this issue by loading the normalizer model by adding this to the ./fairseq/checkpoint_utils.py , in function load_checkpoint_to_cpu:
state["cfg"]["model"]["w2v_path"] = "the path to the pretrained hubert-base model without ctc head" state["cfg"]["task"]["normalize"] = False
Note that the w2v_path points to a pretrained model without any ctc layer in the class of HubertEncoder. And that's why : " This line model = pretrain_task.build_model(w2v_args.model, from_checkpoint=True) in hubert_asr.py is repeated multiple times until the code crashes (HubertEncoder is called more than once)."
hope this can help you~
Hi! Did anybody resolve this issue? I also have all these problems while attempting to run speech normalizer from Textless S2ST. Is there any workaround?
I add these to ./fairseq/checkpoint_utils.py in load_checkpoint_to_cpu:
state["cfg"]["model"]["w2v_path"] = "/PATH/TO/YOUR/DIR/mhubert_base_vp_en_es_fr_it3.pt"
state["cfg"]["task"]["normalize"] = False
and change the line if task.target_dictionary is not None and not cfg.autoregressive:
into if task.target_dictionary is not None:
in HuBERTEncoder's __init__
func in ./fairseq/models/hubert/hubert_asr.py
And the code can run temporarily.
@wyj1996 how to get pretrained hubert without CTC head??
❓ Questions and Help
What is your question?
I am trying to test the model present here. However, I encountered multiple issues when I tried running:
python examples/speech_recognition/new/infer.py \ --config-dir examples/hubert/config/decode/ \ --config-name infer_viterbi \ task.data=${DATA_DIR} \ task.normalize=false \ common_eval.results_path=${RESULTS_PATH}/log \ common_eval.path=${DATA_DIR}/checkpoint_best.pt \ dataset.gen_subset=${GEN_SUBSET} \ '+task.labels=["unit"]' \ +decoding.results_path=${RESULTS_PATH} \ common_eval.post_process=none \ +dataset.batch_size=1 \ common_eval.quiet=True
The first issue was described before here. I solved it by adding this to
checkpoint_utils.py
However, this did not work as I encountered another issue => This line
model = pretrain_task.build_model(w2v_args.model, from_checkpoint=True)
inhubert_asr.py
is repeated multiple times until the code crashes (HubertEncoder is called more than once).I don't know whether I am doing smth wrong or this is an implementation issue?
Note: I specified
common_eval.path
andw2v_path
to be the path to the normalizer.What's your environment?
Ubuntu 20.04 Conda Enviroment