Closed traidn closed 7 months ago
To confirm, you're using a fine-tuning script from NeMo right ? The one inside examples ?
@VahidooX is there something up with the checkpoint ? The config seems ok.
Plus that's not exactly the write command for extract - the key is one of the modules inside of the actual model not the model name. But anyway we don't support inference or training with bare pt ckpt, only with NeMo files usually
Your model file looks like to be corrupted. Please download it and try again. Even training from scratch should work. In that issue, they used a very small batch size which is not easy to train an model with.
@VahidooX I've tried to use script for inference this model on audio file and it works fine. It makes a good transcription on english language, but when i try to use it in training it still throws the mistake.
@titu1994 I use script speech_to_text_hybrid_rnnt_ctc_bpe.py
from your repo. It works fine for training from scratch (actually it doesn't converge but it at least does something), but it throws the mistake in case of fine tuning.
Full code:
import pytorch_lightning as pl
from omegaconf import OmegaConf
from nemo.collections.asr.models import EncDecHybridRNNTCTCBPEModel
from nemo.core.config import hydra_runner
from nemo.utils import logging
from nemo.utils.exp_manager import exp_manager
@hydra_runner(
config_path="./conf", config_name="fastconformer_hybrid_transducer_ctc_bpe_streaming.yaml"
)
def main(cfg):
logging.info(f'Hydra config: {OmegaConf.to_yaml(cfg)}')
trainer = pl.Trainer(**cfg.trainer)
exp_manager(trainer, cfg.get("exp_manager", None))
asr_model = EncDecHybridRNNTCTCBPEModel(cfg=cfg.model, trainer=trainer)
# Initialize the weights of the model from another model, if provided via config
asr_model.maybe_init_from_pretrained_checkpoint(cfg)
trainer.fit(asr_model)
if hasattr(cfg.model, 'test_ds') and cfg.model.test_ds.manifest_filepath is not None:
if asr_model.prepare_test(trainer):
trainer.test(asr_model)
if __name__ == '__main__':
main() # noqa pylint: disable=no-value-for-parameter
May be I should use something else?
@traidn, maybe this is not an official way, but you can try
state_dict = torch.load("model_weights.ckpt", map_location=device)
asr_model.load_state_dict(state_dict)
you can get model_weights.ckpt
if you unpack the nemo checkpoint with tar xvf
@bene-ges Yeah, it allows to get weight, but unfortunately it gives me error
return checkpoint["pytorch-lightning_version"]
KeyError: 'pytorch-lightning_version'
in the start of trainng.
@traidn - maybe you can try to load weights and then train like you did from scratch? Without resuming from checkpoint
@bene-ges Thanks for idea, but it still gives keyerror with pytorch pytorch-lightning_version
when I load state dict. Even if I leave field "resume_from_checkpoint" empty.
I am going to try this model next week to make sure it is not a bug. Have you tried the latest nemo release or one of the old releases to convert and train the model?
@VahidooX That will be appreciated. I use nemo-toolkit 1.21.0 and pytorch 2.1.1.
In the meantime, would you please try an older nemo version for both conversion and training?
I tried a previous version. And it still doesn't work properly. Unfortunately I can't install more older version right now in my environment due to i have troubles with building wheels.
@VahidooX I downgraded Nemo to version 1.20.0 (when the models STT En FastConformer Hybrid Large Streaming 1040ms (doesn't train too) and STT En FastConformer Hybrid Transducer-CTC Large Streaming Multi were presented). I download config file from branch r1.20.0. But it still throws the error with KeyError: "filename 'storages' not found"
. And one more question. Does this two models (STT En FastConformer Hybrid Large Streaming 1040ms and STT En FastConformer Hybrid Transducer-CTC Large Streaming Multi) differ by only one line in the config - att_context_size? Because both modelcard link to the same config file.
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.
Hello! I want to train hybrid model like that for russian language. First of all I tried to train it from scratch on Golos dataset (~1100 hours), but I encountered problem with bad converge (like in this issue). Even after 49 epoch WER was 1.0. After that I decided to try to use pretrained english model and fine tune it for new language. I created new tokenizer for my dataset and insert path to it in config file. I used almost default config from model card:
But it gives me following error:
I tried convert .nemo file to .ckpt with code like that:
But it still give me the error. But in that case error looks like:
Any idea what I should change to fix that? Or may be I miss something?