Closed Project-fil closed 1 year ago
This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been inactive for 7 days since being marked as stale.
I am training an ru model and I don't know if this process is going well
My code:
import nemo
import nemo.collections.asr as nemo_asr
import pytorch_lightning as pl
from nemo.utils.exp_manager import exp_manager
from omegaconf import DictConfig, OmegaConf
params = OmegaConf.load(model_conf_path)
min_duration = 0.5
epochs = 150
gpus = 1
params.trainer.devices = gpus
params.trainer.max_epochs = epochs
params.model.train_ds.manifest_filepath = train_manifest_path
params.model.train_ds.min_duration = min_duration
params.model.validation_ds.manifest_filepath = validation_manifest_path
params.model.test_ds.manifest_filepath = test_manifest_path
params.exp_manager.exp_dir = "/path/to/save_tr_nemo/"
trainer = pl.Trainer(**params.trainer)
model = nemo_asr.models.EncDecCTCModel(cfg=params.model, trainer=trainer)
exp_manager(trainer=trainer, cfg=params.exp_manager)
trainer.fit(model)
trainer.test(model)
There is yaml config: https://github.com/Project-fil/Project-fil/blob/ee79825609cca9e8b01a9c4eb43b564bb5c2b35d/quartznet_ru.yaml
Train manifest file looks like this:
{"audio_filepath": ""/path/to/wav", "duration": 4.8, "text": "у вас просроченных дней две тысячи сто шестьдесят один дней"}
{"audio_filepath": "/path/to/wav", "duration": 9.1, "text": "выключай выключай давайте следующую свечу других нет сейчас расскажу один анекдот когда"}
Total files: 253 390 Duration - 5050.82 hours.I get these files in the "checkpoints" directory and after 10 epochs "wer" never went below 0.9.
'QuartzNet15x5_ru--val_wer=0.9031-epoch=6.ckpt'
'QuartzNet15x5_ru--val_wer=0.9067-epoch=8.ckpt'
'QuartzNet15x5_ru--val_wer=0.9094-epoch=1.ckpt'
'QuartzNet15x5_ru--val_wer=0.9798-epoch=9-last.ckpt'
and now
'QuartzNet15x5_ru--val_wer=0.9031-epoch=6.ckpt'
'QuartzNet15x5_ru--val_wer=0.9067-epoch=8.ckpt'
'QuartzNet15x5_ru--val_wer=0.9078-epoch=11.ckpt'
'QuartzNet15x5_ru--val_wer=0.9798-epoch=19-last.ckpt'
Please tell me this is a normal learning process or am I doing something wrong.