BigRedT / info-ground

Learning phrase grounding from captioned images through InfoNCE bound on mutual information
http://tanmaygupta.info/info-ground/
Other
72 stars 16 forks source link

Hi, can I ask some question about your paper ? (●'◡'●) #2

Open jun0wanan opened 4 years ago

jun0wanan commented 4 years ago

Thank you very much for an extraordinary job!

I'm very interested in your work and I hope to follow in your footsteps (●'◡'●)

Can your work run in many gpu ?

jun0wanan commented 4 years ago

hope to recieve your reply

BigRedT commented 4 years ago

Hi @jun0wanan,

In current form, our code base only supports single GPU training. Part of the challenge in supporting multi-gpu training is the contrastive loss which requires one caption to be compared to all other images in the mini-batch to compute the loss. Note that this is different from typical classification tasks where an image and its label are sufficient to compute loss for that sample and therefore it is easy to partition the batch and place each partition on a separate GPU.

One solution that might work reasonably well is to only use images placed on the same GPU as negatives for the contrastive loss. For example, if the batch size is 100 and you have 4 GPUs, each GPU handles a subset of size 25. So instead of using 99 images as negatives for each caption, you would be using 24.

Hope this helps!

jun0wanan commented 3 years ago

Hi @jun0wanan,

In current form, our code base only supports single GPU training. Part of the challenge in supporting multi-gpu training is the contrastive loss which requires one caption to be compared to all other images in the mini-batch to compute the loss. Note that this is different from typical classification tasks where an image and its label are sufficient to compute loss for that sample and therefore it is easy to partition the batch and place each partition on a separate GPU.

One solution that might work reasonably well is to only use images placed on the same GPU as negatives for the contrastive loss. For example, if the batch size is 100 and you have 4 GPUs, each GPU handles a subset of size 25. So instead of using 99 images as negatives for each caption, you would be using 24.

Hope this helps!

hi,author~ I find a .py have a little error? Maybe it is that I didn't understand your setting..

In https://github.com/BigRedT/info-ground/blob/master/exp/ground/run/eval_flickr_phrase_loc_model_selection.py

model_nums = find_all_model_numbers(exp_const.model_dir) for num in model_nums: continue if num <= 3000: continue model_const.model_num = n

continue why ?

jun0wanan commented 3 years ago

Hi @jun0wanan,

In current form, our code base only supports single GPU training. Part of the challenge in supporting multi-gpu training is the contrastive loss which requires one caption to be compared to all other images in the mini-batch to compute the loss. Note that this is different from typical classification tasks where an image and its label are sufficient to compute loss for that sample and therefore it is easy to partition the batch and place each partition on a separate GPU.

One solution that might work reasonably well is to only use images placed on the same GPU as negatives for the contrastive loss. For example, if the batch size is 100 and you have 4 GPUs, each GPU handles a subset of size 25. So instead of using 99 images as negatives for each caption, you would be using 24.

Hope this helps!

hi ,and I find my model will decrease a lot: image image