b04901014 / FT-w2v2-ser

Official implementation for the paper Exploring Wav2vec 2.0 fine-tuning for improved speech emotion recognition
MIT License
136 stars 32 forks source link

How many GPUs is enough? #2

Closed Gpwner closed 2 years ago

Gpwner commented 2 years ago

Wondering how many Gpus you used, I used 2 v100s (16GB) and still couldn't run the last phase of the code until I reduced batch-size to 48 .I've made sure I use both gpus by modifying the following code:

        trainer = Trainer(
            precision=args.precision,
            amp_backend='native',
            callbacks=[checkpoint_callback] if hasattr(model, 'valid_met') else None,
            checkpoint_callback=hasattr(model, 'valid_met'),
            resume_from_checkpoint=None,
            check_val_every_n_epoch=1,
            max_epochs=hparams.max_epochs,
            num_sanity_val_steps=2 if hasattr(model, 'valid_met') else 0,
            gpus=-1,
            strategy='dp',  # multiple-gpus, 1 machine
            logger=False
        )
b04901014 commented 2 years ago

You can set the self.wav2vec2.encoder.config.gradient_checkpointing = True in https://github.com/b04901014/FT-w2v2-ser/blob/main/modules/FeatureFuser.py#L103 This will greatly reduce the required VRAM if you are using a single GPU. (But it cannot scale to multiple GPUs if using DDP due to the algorithm is not compatible with gradient checkpointing.) Also, if you limit the maxseqlen argument, it will truncate any training examples that is longer than that number (in seconds), which also reduces the VRAM usage as the padding is done, but this may influence the performance of the model (I don't think it will hurt a lot).

I use only one Quadro RTX 8000 (48 GB) for the last phase.

Gpwner commented 2 years ago

You can set the self.wav2vec2.encoder.config.gradient_checkpointing = True in https://github.com/b04901014/FT-w2v2-ser/blob/main/modules/FeatureFuser.py#L103 This will greatly reduce the required VRAM if you are using a single GPU. (But it cannot scale to multiple GPUs if using DDP due to the algorithm is not compatible with gradient checkpointing.) Also, if you limit the maxseqlen argument, it will truncate any training examples that is longer than that number (in seconds), which also reduces the VRAM usage as the padding is done, but this may influence the performance of the model (I don't think it will hurt a lot).

I use only one Quadro RTX 8000 (48 GB) for the last phase.

Will this change result in an eventual performance degradation?

Gpwner commented 2 years ago

And I don't quite understand why you compute the confusion matrix over the entire data set: https://github.com/b04901014/FT-w2v2-ser/blob/main/run_downstream_custom_multiple_fold.py#L93

I think it should be:

WriteConfusionSeaborn(
    confusion,
    model.dataset.test_dataset,
    os.path.join(args.saving_path, 'confmat.png')
)
Gpwner commented 2 years ago

And I don't quite understand why you compute the confusion matrix over the entire data set: https://github.com/b04901014/FT-w2v2-ser/blob/main/run_downstream_custom_multiple_fold.py#L93

I think it should be:

WriteConfusionSeaborn(
    confusion,
    model.dataset.test_dataset,
    os.path.join(args.saving_path, 'confmat.png')
)

It seem to be you add all the experimen’s confusion matrixs to one, am I right?

b04901014 commented 2 years ago

Gradient checkpointing is a way to reduce memory cost by trading an additional forward pass, detailed in https://github.com/cybertronai/gradient-checkpointing It should not impact the performance of the model.

For the confusion matrix. model.dataset.emoset is just a list of label strings as shown in here: https://github.com/b04901014/FT-w2v2-ser/blob/main/downstream/Custom/dataloader.py#L17 If you print it out, it should be something like ['anger', 'sad', 'neutral', 'sad'] if that's all the emotions in the training set.

All of the information is already in the confusion matrix itself, the function need that list since it needs to know what row/column corresponds to what emotion.

b04901014 commented 2 years ago

And I don't quite understand why you compute the confusion matrix over the entire data set: https://github.com/b04901014/FT-w2v2-ser/blob/main/run_downstream_custom_multiple_fold.py#L93 I think it should be:

WriteConfusionSeaborn(
    confusion,
    model.dataset.test_dataset,
    os.path.join(args.saving_path, 'confmat.png')
)

It seem to be you add all the experimen’s confusion matrixs to one, am I right?

Yes, the final confusion matrix is the sum of all runs/folds. But the stats (mean/std of UAR, F1) will contain more details about individual runs.

Gpwner commented 2 years ago

I see, Thank you,I will close this issue.