fredzzhang / pvic

[ICCV'23] Official PyTorch implementation for paper "Exploring Predicate Visual Context in Detecting Human-Object Interactions"
BSD 3-Clause "New" or "Revised" License
67 stars 8 forks source link

Swin backbone trained weights #56

Closed YangJae96 closed 1 month ago

YangJae96 commented 2 months ago

Hi. Thank you for your great work

Is only ResNet-50 backbone available for inference? Could you please provided the Swin-L backbone model weights if possible?!

I would like the evaluate the results and use it for custom data for inference!

Thanks in advance.

fredzzhang commented 2 months ago

Hi @YangJae96,

Unfortunately we did not save the checkpoints with larger backbones due to storage reasons. You can find the fine-tuned object detector with Swin-L here. With this, you should be able to train the HOI detector yourself. Using the --use-checkpoint flag significantly reduces memory requirement.

Let me know if you have trouble reproducing the results.

Cheers, Fred.

YangJae96 commented 1 week ago

@fredzzhang HI,

In your paper, you used 8 GPU with batch size 2.

Can I ask why?!

And also, I only have 1 GPU... will the performance drop significantly when I use only 1 GPU with a more larger batch-size like 32?

fredzzhang commented 1 week ago

Hi @YangJae96,

I don't think the performance will drop significantly unless the batch size gets too low.

Fred.