baijinglin / TS-BSmamba2

TS-BSmamba2: A TWO-STAGE BAND-SPLIT MAMBA-2 NETWORK FOR MUSIC SEPARATION
Apache License 2.0
28 stars 0 forks source link

Batch Size #3

Open crlandsc opened 1 day ago

crlandsc commented 1 day ago

Your paper states that you trained on 8 V100s with a batch size of 16. Does this mean that each GPU had a batch of 2? I am trying to get a gauge of how much VRAM this model uses for someone with less compute. Thanks!

baijinglin commented 1 day ago

Yes, I use the checkpoint_sequential(functions, 2, input, -1) method, which helps reduce memory usage. With this approach, each batch takes approximately 14GB to 15GB of GPU memory. For reference, a single batch in my case consists of 3-second 44.1kHz stereo audio data.

crlandsc commented 1 day ago

Thanks for the info! I am impressed that it achieves such stellar results on only 3-seconds of audio. Most other models require much more context length for their performance (for example, SCNet was trained on an 11-second window). Did you ever test with longer context length?

baijinglin commented 1 day ago

So far, I’ve only used 3-second segments for both training and inference, and haven’t tested with longer context lengths yet. Thank you for the suggestion! Similar to how SIMO stereo BSRNN conducted inference tests with different lengths, I’d be willing to try testing with varying inference lengths in the future.