akshitac8 / OW-DETR

[CVPR 2022] Official Pytorch code for OW-DETR: Open-world Detection Transformer
232 stars 39 forks source link

The training schedule #17

Closed luckychay closed 1 year ago

luckychay commented 2 years ago

Dear Author,

In the paper, I see that every task is trained for 50 epochs and finetuned for 20 epochs as 13291a30151dcbb1fe48d3092ce85ce

However, in configs/OWOD_new_split.sh, I see the training schedule is following a different setting as highlighted by the red boxes. e2489ae780a580596b7bbb8e7fb1c3a

Is there anything I missed? Looking forward to your reply. Thanks.

ghost commented 2 years ago

I actually trained using their scripts but could not recreate their results. For T1 (after 50 epochs):

Prev class AP50: tensor(43.3941) Prev class Precisions50: 5.411373414254498 Prev class Recall50: 71.44982028560455 Current class AP50: tensor(22.2871) Current class Precisions50: 1.8076766423054116 Current class Recall50: 57.50498649613736 Known AP50: tensor(32.3129) Known Precisions50: 3.519432608981228 Known Recall50: 64.12878254613427 Unknown AP50: tensor(0.0863) Unknown Precisions50: 0.9169071669071669 Unknown Recall50: 7.730652247380871

AND one thing to note is that they continue numbering the epochs across tasks, so for the second one it will resume at epoch 50 and train for an additional 50 epoch

luckychay commented 2 years ago

Yeah, I did notice that they continue numbering the epochs across tasks. But even though in that case, theirs scripts are apparently different from what discribed in the paper.

akshitac8 commented 2 years ago

Hello @luckychay @orrzohar-stanford The paper use 2 open-world splits and I have updated the repo with both splits configs. Can you please let me know which split config is causing the problem?

ghost commented 2 years ago

Dear authors, Thank you for responding! I have been using the new proposed data splits. After training the model for 40 epochs + 10 fine-tuning, I am getting results closer to what was reported - but still a little off. I am not sure why - I used the (unmodified) bash scripts provided (trained on a similar machine with 8 V100 GPUs/etc). Any reason you can think of?

<html xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">

Task IDs | Task 1 |   -- | -- | --   | U-Recall | mAP ORE-EBUI | 1.5 | 61.4 Ours: OW-DETR | 5.7 | 71.5 Original codebase | 3.9 | 71.85 Amended (40+10) | 5.05 | 71.9

And overall: image

luckychay commented 2 years ago

Hello @luckychay @orrzohar-stanford The paper use 2 open-world splits and I have updated the repo with both splits configs. Can you please let me know which split config is causing the problem?

Thanks for your reply. I am using old splits from ORE, in fact no problem is causing by the config for me. I am just confused about how many epochs I should train and finetune in incremental step. I notice that your newly uploaded scripts train about 5 epochs for task2,3,4 and finetune 45,30,20 epochs respectivly. The training epochs is much less than 50 epochs. How could that happen?

I am not familiar with this part and thank you for your patience.

Went-Liang commented 2 years ago

Dear authors, Thank you for responding! I have been using the new proposed data splits. After training the model for 40 epochs + 10 fine-tuning, I am getting results closer to what was reported - but still a little off. I am not sure why - I used the (unmodified) bash scripts provided (trained on a similar machine with 8 V100 GPUs/etc). Any reason you can think of?

Task IDs Task 1     U-Recall mAP ORE-EBUI 1.5 61.4 Ours: OW-DETR 5.7 71.5 Original codebase 3.9 71.85 Amended (40+10) 5.05 71.9 And overall: image

Hello, could you share your trained models?

akshitac8 commented 1 year ago

The weights are uploaded in the repository. According to the results you have shared they look pretty close so it just might be environment change or because of machine changes but you can always visualize and check how the unknown classes are responding to your code.

zhongxiangzju commented 1 year ago

Dear @akshitac8 and @orrzohar-stanford,

May I ask that how long it takes to train the model on OWOD_split_task1 using 8 V100 GPUs for 50 epoches ? I only have 2 RTX3090 GPUs, and I am estimating if it is possible and how long it may take to train the models.

Thanks.