A question of the args "--grad_ac_step"

QData / C-Tran

General Multi-label Image Classification with Transformers

MIT License

238 stars 40 forks source link

A question of the args "--grad_ac_step" #11

Closed Kyfafyd closed 2 years ago

Kyfafyd commented 2 years ago

Dear author,

Thanks very much for your interesting work! I wonder why the args "--grad_ac_step" is set as 2 for VOC2007?

From your code, I see that if you set "--grad_ac_step" as 2, a minibatch images (16 images) will not be used for optimization. Can you help me solve this issue?

Thanks very much!

jacklanchantin commented 2 years ago

If you use a batch size of 16, but let the gradients accumulate for 2 batches (grad_ac_step = 2), then it uses an effective batch size of 32. It helps when you can't store all 32 images in GPU memory

Kyfafyd commented 2 years ago

Dear author,

Thanks very much for you response!! I am clear now. I have another question: which type and how many gpus are you used for training on COCO and VOC?

jacklanchantin commented 2 years ago

We used four NVIDIA Titan X GPUs

Kyfafyd commented 2 years ago

Dear author,

Thanks very much for you patience! How long it will take for COCO and VOC, respectively? It seems it is very slow for COCO...

jacklanchantin commented 2 years ago

VOC: ~5 minutes per epoch COCO: ~1 hour per epoch

Kyfafyd commented 2 years ago

But it seems the default training config is 100epochs, so when the model will be converged, especially for COCO? Until 100 epochs?

jacklanchantin commented 2 years ago

It's possible it could take that long. you will have to run it for at least a few days.

On Mon, Nov 8, 2021 at 10:38 AM Zhao WANG @.***> wrote:

But it seems the default training config is 100epochs, so when the model will be converged, especially for COCO? Until 100 epochs?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/QData/C-Tran/issues/11#issuecomment-963281291, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJFX7C2ORI5Z6FQ3LTXG43UK7VIJANCNFSM5HSWA4XA .

Kyfafyd commented 2 years ago

So, have you used 4 GPUs with a batch size 16? It seems 4 GPUs is capable for a batch size 32.

jacklanchantin commented 2 years ago

Yes, you could do that

On Mon, Nov 8, 2021 at 10:44 AM Zhao WANG @.***> wrote:

So, have you used 4 GPUs with a batch size 16? It seems 4 GPUs is capable for a batch size 32.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/QData/C-Tran/issues/11#issuecomment-963286731, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJFX7CSIIEBX3L6EUAYL43UK7V6RANCNFSM5HSWA4XA .

Kyfafyd commented 2 years ago

Thanks very much for your response!

THUeeY commented 2 years ago

Excuse me, did you get the mAP of voc 2007? It was not mentioned in the paper.

jacklanchantin commented 2 years ago

We did not run experiments on voc 2007 as it is a small dataset with very few labels.