Lightning-Universe / lightning-flash

Your PyTorch AI Factory - Flash enables you to easily configure and run complex AI recipes for over 15 tasks across 7 data domains
https://lightning-flash.readthedocs.io
Apache License 2.0
1.74k stars 213 forks source link

Testing a Minimal Training Example #883

Closed choclatier closed 3 years ago

choclatier commented 3 years ago

❓ Questions and Help

I'm trying to train on the timit test dataset for a sanity check, but it seems the more I train on this set the worse the model gets. I tried with one epoch, its gets slightly worse. With 10 its inferences nothing or <unk> tokens. I also have a larger dataset I tried to fit the data to, and get < val_score: inf > I can't tell if my libraries are somehow corrupted. I was able to actually fiddle around and successfully finetune once out of a 150 runs. I still can't reproduce the model. Is there a test I can do to ensure that there is some sort of learning happening? How can I check that that the model weights were actually trained to some degree, better or worse? I'm also suspect that after installing Kaldi there were some c libraries that were possibly replaced out, could this be an issue? It might be coincidence but I think after that was I not able to reproduce the model.

Code

import torch
from wer import wer
import flash
from flash.audio import SpeechRecognition, SpeechRecognitionData
from pytorch_lightning  import loggers, plugins

# 0. Logger
logger = loggers.TensorBoardLogger('lightning_logs/')
# 1. Create the DataModule

datamodule = SpeechRecognitionData.from_json(        
    input_fields="file",
    target_fields="text",
    train_file=f"./data/timit/train.json",
    test_file=f"./data/timit/test.json",
    val_file=f"./data/timit/test.json",
)
datamodule.batch_size=1

# 2. Build the task

model = SpeechRecognition(  backbone="./models/wav2vec2/base_model")

# 3. Create the trainer and finetune the model

data = "SHE HAD YOUR DARK SUIT IN GREASY WASH WATER ALL YEAR"

trainer = flash.Trainer(amp_level="03",                        
                        logger=logger,
                        max_epochs=10,    
                        auto_lr_find=True,
                        accelerator='ddp',
                        plugins=plugins.DDPFullyShardedPlugin(),                        
                        gpus=torch.cuda.device_count(),
                        precision=16)

trainer.fit(model, datamodule=datamodule)
trainer.validate(model, datamodule=datamodule)
# 4. Predict on audio files!
file = "data/timit/example.wav"
predictions = model.predict([file])
wer(data,predictions[0])

# 5. Save the model!
trainer.save_checkpoint("speech_recognition_model.pt")

Console

MODEL PRINT
"backbone":         ./models/wav2vec2/base_model
"learning_rate":    1e-05
"optimizer":        <class 'torch.optim.adam.Adam'>
"optimizer_kwargs": None
"scheduler":        None
"scheduler_kwargs": None
"serializer":       None

----------------------------------------------------------------------------------------------------

LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
LOCAL_RANK: 3 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]

  | Name          | Type           | Params
-------------------------------------------------
0 | model         | Wav2Vec2ForCTC | 94.4 M
1 | train_metrics | ModuleDict     | 0    
2 | val_metrics   | ModuleDict     | 0    
-------------------------------------------------
94.4 M    Trainable params
0         Non-trainable params
94.4 M    Total params
377.585   Total estimated model params size (MB)

WARNING:root:ShardedGradScaler is to be used in combination with a sharded optimizer, this could not be checked
WARNING:root:ShardedGradScaler is to be used in combination with a sharded optimizer, this could not be checked
WARNING:root:ShardedGradScaler is to be used in combination with a sharded optimizer, this could not be checked
Epoch 9: 100%|████████████████| 26/26 [00:01<00:00, 17.21it/s, loss=139, v_num=145, train_loss_step=123.0, val_loss=99.00, train_loss_epoch=281.0"backbone":         ./models/wav2vec2/base_model                                                                                                  

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
LOCAL_RANK: 2 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
LOCAL_RANK: 3 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
LOCAL_RANK: 1 - CUDA_VISIBLE_DEVICES: [0,1,2,3]
                                                                                                                                                 --------------------------------------------------------------------------------████████████████████               | 11/13 [00:00<00:00, 46.49it/s]
DATALOADER:0 VALIDATE RESULTS
{'val_loss': 98.98580169677734}

WER: 100.00%
----------------------------------------------------------------------------------------------------
SHE HAD YOUR DARK SUIT IN GREASY WASHWATER ALL YEAR
****************************************************************************************************
<unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>

What's your environment?

choclatier commented 3 years ago

Found out the problem was the training couldn't handle sentence cased training examples. After uppercasing all training examples the model finally started to train. I thought the tokenizer would take care of this or some preprocessing step. The documentation should probably mention this or further explain this.