deepfakes / faceswap-playground

User dedicated repo for the faceswap project
306 stars 194 forks source link

Is a bigger batch size better? #107

Closed Tvde1 closed 6 years ago

Tvde1 commented 6 years ago

See title.

I've experimented with small batch sizes and quick epochs and slow epochs with batch sizes like 256.

I'm cpu training if that makes any difference.

Clorr commented 6 years ago

In Deep Learning, batch composition does have an effect on the training. For example, if you have a batch with many similar samples, it will orient the training in a certain way which is not a good point. Small batches are inherently more subject to that. But making batch bigger is not a guarantee of a good training, variety is more the key here. This is a problem that exists for all DL systems, you can find other answers around, like this one: https://datascience.stackexchange.com/questions/12532/does-batch-size-in-keras-have-any-effects-in-results-quality

Clorr commented 6 years ago

Also note that epoch and iterations are different. epoch is a training with all your samples, while iteration is one training with only a batch of samples. So for same number of samples, an epoch should be similar whatever the batch size

Tvde1 commented 6 years ago

What would you say is the "best" batch size? As many as you can take or around the middle?

Clorr commented 6 years ago

Above 16 seems fine. Above that, as long as your source images are varied enough, the batch size won't have that much impact. Portraits like we handle have already a big variance so having big batch sizes don't bring that much advantage

bryanlyon commented 6 years ago

Big batches mainly speed up training by running more pictures through at once. I agree that there are diminishing returns, but you might as well set it as big as your GPU memory supports.

On Mon, Mar 19, 2018 at 12:41 PM, Clorr notifications@github.com wrote:

Above 16 seems fine. Above that, as long as your source images are varied enough, the batch size won't have that much impact. Portraits like we handle have already a big variance so having big batch sizes don't bring that much advantage

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/deepfakes/faceswap-playground/issues/107#issuecomment-374343637, or mute the thread https://github.com/notifications/unsubscribe-auth/ADEuwXSQn-Rf2h_8L5fXmq2HwsUNGnpuks5tgAnVgaJpZM4SwbrZ .

Tvde1 commented 6 years ago

I'm cpu training on an old pc and it can handle a batch size of 256. Is that preferable over 32, 64 or 128?

bryanlyon commented 6 years ago

CPU it wont matter at all. Since CPU has to do them all one at a time anyway. It only matters on the GPU.