NVIDIA / NeMo

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
https://docs.nvidia.com/nemo-framework/user-guide/latest/overview.html
Apache License 2.0
11.84k stars 2.46k forks source link

Everything has been changed?! #1091

Closed soheiltehranipour closed 3 years ago

soheiltehranipour commented 4 years ago

Hello,

I am working on NeMo for about 2 months and yesterday I suddenly figured out that everything has been updated. Is that true? Many files are missing though...

okuchaiev commented 4 years ago

Hi @soheiltehranipour

Yes, we are working on a major redesigned version of NeMo. You can see it now in, a default, "main" branch. This version is easily PyTorch-compatible, introduces concepts of models and adopts PyTorch Lightning for training.

A great place to start making yourself familiar with this new version is to checkout our tutorials (which can be run on Colab, just don't forget to set runtime type to GPU)

We strongly recommend you switch to this new version of NeMo because this is what 1.0.0 version will be based on. If you need access to the old version - it can be found by v0.11.1 tag https://github.com/NVIDIA/NeMo/releases/tag/v0.11.1

I apologize for the inconvenience and hope you'll like new version. Thank you for your interest in NeMo!

soheiltehranipour commented 4 years ago

Thanks a lot! Looking forward to more documentation for it.

The first link is dead though.

okuchaiev commented 4 years ago

fixed, the link. thx

khursani8 commented 4 years ago

I just realized about the changes today, it so easy to migrate since I just need to load my model weights and I can use current version straight away based on Fine Tuning example, Thanks.

viveksj commented 4 years ago

I just realized about the changes today, it so easy to migrate since I just need to load my model weights and I can use current version straight away based on Fine Tuning example, Thanks.

How exacly would I go about loading model weights from previous version and using. it for speech to text locally. (I built a custom manifesto using https://github.com/NVIDIA/NeMo/blob/master/examples/asr/notebooks/1_ASR_tutorial_using_NeMo.ipynb as reference) (I tried asking the same at https://github.com/NVIDIA/NeMo/issues/1057 in case this thread isn't the right place to have this answered)

khursani8 commented 4 years ago

If you use old nemo, it will save your Encoder and Decoder weights For the new one I just instantiate the model and then do something like this

quartznet = nemo_asr.models.EncDecCTCModel.from_pretrained(model_name="QuartzNet15x5Base-En") # when I think back I don't need to download the pretrained model since it will be replace anyway
enc = torch.load("sk_ckpt/JasperEncoder-STEP-810000.pt")
dec = torch.load("sk_ckpt/JasperDecoderForCTC-STEP-810000.pt")
quartznet.encoder.load_state_dict(enc)
quartznet.decoder.load_state_dict(dec)
del enc
del dec

And then follow the Transfer learning tutorial

from nemo.collections.asr.losses.ctc import CTCLoss
quartznet.loss = CTCLoss(num_classes=quartznet.decoder.num_classes_with_blank - 1,zero_infinity=True) # before this if don't have "zero_infinity=True" my loss will always be nan and the model not learn anything, not sure latest version already add this params
quartznet._train_dl.num_workers = 4
quartznet._validation_dl.num_workers = 4
trainer = pl.Trainer(gpus=[1], max_epochs=50,
                     amp_backend='native',
                     precision=16,
                     amp_level='O1',
                     val_check_interval=0.1,
                     accumulate_grad_batches=100
                    )
trainer.fit(quartznet)

Next time I want to continue fine tune my model I instantiate using this

quartznet = nemo_asr.models.EncDecCTCModel.load_from_checkpoint("lightning_logs/version_37/checkpoints/epoch=0.ckpt")
anujsuchchal commented 4 years ago

@okuchaiev Thanks for your efforts. I hope you are including 'word level timestamps' with and without using language model in ASR in this major release. If not, its a request to please add this feature as well. Also, is there any tentative date when 1.0.0 will be released?