Inference with Textless S2ST pretrained models issue

dina-adel commented 2 years ago

❓ Questions and Help

What is your question?

I am trying to test the model present here. However, I encountered multiple issues when I tried running: python examples/speech_recognition/new/infer.py \ --config-dir examples/hubert/config/decode/ \ --config-name infer_viterbi \ task.data=${DATA_DIR} \ task.normalize=false \ common_eval.results_path=${RESULTS_PATH}/log \ common_eval.path=${DATA_DIR}/checkpoint_best.pt \ dataset.gen_subset=${GEN_SUBSET} \ '+task.labels=["unit"]' \ +decoding.results_path=${RESULTS_PATH} \ common_eval.post_process=none \ +dataset.batch_size=1 \ common_eval.quiet=True

The first issue was described before here. I solved it by adding this to checkpoint_utils.py

state["cfg"]["model"]["w2v_path"] = "/home/dina/repos/fairseq/data_models/checkpoint_best.pt"
state["cfg"]["task"]["normalize"] = False

However, this did not work as I encountered another issue => This line model = pretrain_task.build_model(w2v_args.model, from_checkpoint=True) in hubert_asr.py is repeated multiple times until the code crashes (HubertEncoder is called more than once).

I don't know whether I am doing smth wrong or this is an implementation issue?

Note: I specified common_eval.path and w2v_path to be the path to the normalizer.

What's your environment?

Ubuntu 20.04 Conda Enviroment

AhmedEssam19 commented 2 years ago

Same Problem

gmryu commented 2 years ago

@dina-adel @AhmedEssam19 Have you tried this advice? https://github.com/facebookresearch/fairseq/issues/4597#issuecomment-1198377793

It feels like a wrong logic is implemented with w2v_path, resulting a forever loop. So you need delete it and declare the origin w2v_args instead. The comment tells you how to retrieve them.

AhmedEssam19 commented 2 years ago

@gmryu The problem is that there is a lot of confusion regarding the referenced model, the speech normalizer, and the w2v in textless_s2st. How should each of these models get passed as arguments to the inference script?

Haoqiu-Yan commented 1 year ago

Hey, has anyone successed loading checkpoint_best.pt of testless? I am struggling to deal with it, please help me T_T... Here is what I have tried, but it is useless.

MODEL_PATH = "/data1/haoqiuyan/run_textless/en_1h/checkpoint_best.pt"
NEW_MODEL_PATH = "/data1/haoqiuyan/run_textless/en_1h/update_checkpoint_best_4.pt"
BASE_HUBERT = "/data1/haoqiuyan/run_textless/mhubert_base_vp_en_es_fr_it3.pt"
# load base hubert
hubert = torch.load(BASE_HUBERT)
hcfg = hubert["cfg"]
hcfg["task"]["normalize"] = False
hcfg["task"]["autoregressive"] = False
ref_args = convert_namespace_to_omegaconf(hcfg)

# load normalizer
model = torch.load(MODEL_PATH)
cfg = model['cfg']
cfg["model"]["w2v_path"] = None
cfg["model"]["w2v_args"] = ref_args
torch.save(model, NEW_MODEL_PATH)

And, I run the following command:

CUDA_VISIBLE_DEVICES=0 python examples/speech_recognition/new/infer.py --config-dir examples/hubert/config/decode/ \
    --config-name infer_viterbi \
    task.data=/data1/haoqiuyan/run_textless \
    task.normalize=false \
    common_eval.results_path=/data1/haoqiuyan/run_textless/result_units/log \
    common_eval.path=/data1/haoqiuyan/run_textless/en_1h/update_checkpoint_best_4.pt \
    dataset.gen_subset=voxpopuli \
    dataset.num_workers=6 \
    '+task.labels=["unit"]' \
    +decoding.results_path=/data1/haoqiuyan/run_textless/result_units \
    common_eval.post_process=none \
    +dataset.batch_size=1 \
    common_eval.quiet=True

I got failed message. Maybe there is some error of w2v_path.

File "/data1/haoqiuyan/fairseq/fairseq/checkpoint_utils.py", line 367, in load_model_ensemble
    ensemble, args, _task = load_model_ensemble_and_task(
  File "/data1/haoqiuyan/fairseq/fairseq/checkpoint_utils.py", line 473, in load_model_ensemble_and_task
    model = task.build_model(cfg.model, from_checkpoint=True)
  File "/data1/haoqiuyan/fairseq/fairseq/tasks/fairseq_task.py", line 340, in build_model
    model = models.build_model(cfg, self, from_checkpoint)
  File "/data1/haoqiuyan/fairseq/fairseq/models/__init__.py", line 90, in build_model
    cfg = merge_with_parent(dc(), cfg, from_checkpoint)
  File "/data1/haoqiuyan/fairseq/fairseq/dataclass/utils.py", line 500, in merge_with_parent
    merged_cfg = OmegaConf.merge(dc, cfg)
omegaconf.errors.ValidationError: Non optional field cannot be assigned None
        full_key: w2v_path
        reference_type=Optional[HubertCtcConfig]
        object_type=HubertCtcConfig

wyj1996 commented 1 year ago

I have solve this issue by loading the normalizer model by adding this to the ./fairseq/checkpoint_utils.py , in function load_checkpoint_to_cpu:

state["cfg"]["model"]["w2v_path"] = "the path to the pretrained hubert-base model without ctc head" state["cfg"]["task"]["normalize"] = False

Note that the w2v_path points to a pretrained model without any ctc layer in the class of HubertEncoder. And that's why : " This line model = pretrain_task.build_model(w2v_args.model, from_checkpoint=True) in hubert_asr.py is repeated multiple times until the code crashes (HubertEncoder is called more than once)."

wyj1996 commented 1 year ago

hope this can help you~

vpronina commented 1 year ago

Hi! Did anybody resolve this issue? I also have all these problems while attempting to run speech normalizer from Textless S2ST. Is there any workaround?

zhouyan19 commented 1 year ago

I add these to ./fairseq/checkpoint_utils.py in load_checkpoint_to_cpu:

state["cfg"]["model"]["w2v_path"] = "/PATH/TO/YOUR/DIR/mhubert_base_vp_en_es_fr_it3.pt"
state["cfg"]["task"]["normalize"] = False

and change the line if task.target_dictionary is not None and not cfg.autoregressive: into if task.target_dictionary is not None:in HuBERTEncoder's __init__ func in ./fairseq/models/hubert/hubert_asr.py And the code can run temporarily.

HERIUN commented 5 hours ago

@wyj1996 how to get pretrained hubert without CTC head??

facebookresearch / fairseq