The issue of cudnn affecting speed

MzeroMiko / VMamba

VMamba: Visual State Space Models，code is based on mamba

MIT License

1.82k stars 98 forks source link

The issue of cudnn affecting speed #204

Open lp-094 opened 1 month ago

lp-094 commented 1 month ago

Hello, we tried to turn off cudnn, but there was no improvement in speed. Does it need to be in a specific version (such as V3) to be effective?

MzeroMiko commented 1 month ago

Do not worry, because it only happens to some machines (I do not get the pattern actually why a machine will be slow, maybe it is related to the driver or library).

using torch.backends.cudnn.enabled=True in downstream tasks may be quite slow. If you found vmamba quite slow in your machine, disable it in vmamba.py, else, ignore this.

LQchen1 commented 3 weeks ago

I had a similar problem, even though I set it to False in vmamba.py, its training time was still unstable, I used pytorch==2.0.1, python=3.11, V100

LQchen1 commented 3 weeks ago

Do not worry, because it only happens to some machines (I do not get the pattern actually why a machine will be slow, maybe it is related to the driver or library).

using torch.backends.cudnn.enabled=True in downstream tasks may be quite slow. If you found vmamba quite slow in your machine, disable it in vmamba.py, else, ignore this.

In fact, when I used an A100*8 card with batch size set to 512, it took me 2 hours to run an epoch。

MzeroMiko commented 3 weeks ago

That seems not possible. What environment you are using?

MzeroMiko commented 3 weeks ago

Also, for 8xV100, the time is about 10mins per epoch.

LQchen1 commented 3 weeks ago

Also, for 8xV100, the time is about 10mins per epoch.

I re-executed the program, and the first round took a long time, but the subsequent training was normal. However, I set torch.backends.cudnn.enabled = True. Could you let me know if this has a big impact on the model's performance?

MzeroMiko commented 3 weeks ago

It is still weird somehow, and I do not know why the first round would be abnormal. In my experiments, all epochs' time-consumption are similar, while all the first iter in each epoch is slow, as the program need to load the data from the very beginning.

Enable or disable cudnn may influence the performance, but I think the difference is tolerable.

LQchen1 commented 3 weeks ago

Looking at the logs, it seems to be the data loading, which took a lot of time because we were putting the data on another server and making it available to each server through data sharing.

LQchen1 commented 3 weeks ago

@MzeroMiko
The configuration file I used is vmambav2_tiny_224.yaml. I contrasted the current log with the author's log and found that the accuracy of EMA was much lower than expected. My EMA has an accuracy of 0.29%, but the author has an accuracy of 6.08% for the same epoch, my emaacc updates very slowly, what is the cause of this, I don't know much about how EMA works.

MzeroMiko commented 3 weeks ago

Oh, it is because the batch_size you use is much bigger than mine. EMA means that in every iter, the parameter is updated with the latest version. The batch_size is smaller, the more frequently the ema parameter is updated, and the higher performance it'll get in a certain epoch.

But I can not predict what'll happen in the last 50 epochs, as the training start to converge. You may get higher performance with this batchsize.

LQchen1 commented 3 weeks ago

@MzeroMiko Thanks for your reply, it is now back to normal.