Closed Amshaker closed 5 months ago
Hello, thanks for your interest. Indeed, we just use the official AOT codes to generate the prediction by editing the dataloader coded. Besides, we limit the length of memories as mentioned in our paper. You can refer to the following code.
def update_long_term_memory(self, new_long_term_memories):
updated_long_term_memories = []
for new_long_term_memory, last_long_term_memory in zip(
new_long_term_memories, self.long_term_memories):
updated_e = []
for new_e, last_e in zip(new_long_term_memory, last_long_term_memory):
v_len = int(last_e.size(0) / self.enc_hw)
if v_len > 6:
last_e = [last_e[0:self.enc_hw], last_e[self.enc_hw * 2:]]
last_e = torch.cat(last_e, dim=0)
updated_e.append(torch.cat([new_e, last_e], dim=0))
updated_long_term_memories.append(updated_e)
self.long_term_memories = updated_long_term_memories
I'm sorry that we are occupied recently. We will release the test script and DDMemory later.
Best regards
Thank you for your reply. I have one more question. When do you update the long-memory (Calling of update_long_term_memory
) during testing? Is it each 5 frames similar to the AOT framework? or do you update each frame?
Best regards, Abdelrahman.
For AOT prediction, we follow the setting in the aot paper and update the long memory every 5 frame.
Thanks for your reply.
One last question before closing this issue.
Are the numbers reported in this table "Before" fine-tuning pre-trained on DAVIS & Youtube-VOS? or just pre-trained on static images?
Thanks again!
Hi, the 'Before' denotes the performance of models which are pretrained on DAVIS & YouTube-VOS. In detail, we just use the official parameters which is firstly pretrained on static images and then trained on DAVIS & YouTube-VOS.
Hi,
I used the framework of AOT with your update for update_long_term_memory
However, the prediction results for the validation set seem to be much higher compared to the results of your paper.
Here is the evaluation API: python evaluation_method.py --task semi-supervised --results_path PREDICTION_PATH --mp_nums 1
This is the predicted result for val set:
--------------------------- Global results for valid --------------------------- J&F-Mean J-Mean J-seen-Mean J-unseen-Mean F-Mean F-seen-Mean F-unseen-Mean V-Mean V-seen-Mean V-unseen-Mean
0.794931 0.738624 0.75179 0.725458 0.851238 0.855106 0.84737 0.184196 0.187739 0.180653
The J&F-Mean is 0.794931, while you mentioned in the paper that the results of J&F-Mean for AOT-L is 59.4. What could be the reason for this gap?
I have rerun the evaluation codes of AOT. I evaluated DeAOT-L on LVOS valid sets. The result is similar to the performance in paper. I think maybe you did not modify the dataloader codes and used the groundtruth masks of each frame. And the performance is similar to the oracle experiments in paper. I think you can recheck the dataloader codes and ensure only the mask of the first frame is inputted.
I think it is necessary to release the apis of process the LVOS to make it easier to use and my own evaluation codes. We will release them as soon as possible.
Yes please, we are working on a CVPR project (deadline after one month) based on your recent work and the published benchmark. Releasing your evaluation codes will help us to evaluate our method correctly on LVOS.
If it will take more than a few days, please share with me your class LVOS_Test(object):
of dataloaders/eval_datasets.py
Thank you so much.
Hello, we have released the modified test scripts. Please see this repositories for more details.
Thank you for releasing the APIs. They are really helpful.
I resized the evaluation data first to 480p (480, 853) instead of 720p (720, 1280).
Then, I followed all the instructions in LVOS-apis repo. However, I still can't reproduce the accuracy or the memory allocation for AOT-L. The max memory is 4.05G instead of 1.32G in your paper, the J&F I reproduces is 61.5% instead of the reported J&F of 59.4%.
Are you sure you did not miss anything else? What could be the reason? I debugged the memory and it now contains only 6 frames and it is updated every 5 frames. What else could be the reason for higher memory and higher J&F? The videos now are at 480p. The FPS seems to be fine.
Eval AOT-L on lvos val:
Done!
GPU 0 - Processing Seq 0tCWPOrc [1/50]:
GPU 0 - Seq 0tCWPOrc - FPS: 27.03. All-Frame FPS: 27.03, All-Seq FPS: 27.03, Max Mem: 2.02G
GPU 0 - Processing Seq 3Zf4NFzn [2/50]:
/home/x_fahkh/.conda/envs/focal/lib/python3.8/site-packages/torch/functional.py:504: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at ../aten/src/ATen/native/TensorShape.cpp:3483.)
return _VF.meshgrid(tensors, **kwargs) # type: ignore[attr-defined]
GPU 0 - Seq 3Zf4NFzn - FPS: 26.68. All-Frame FPS: 26.80, All-Seq FPS: 26.85, Max Mem: 2.50G
GPU 0 - Processing Seq 3nsHQkEK [3/50]:
GPU 0 - Seq 3nsHQkEK - FPS: 27.16. All-Frame FPS: 26.89, All-Seq FPS: 26.95, Max Mem: 2.50G
GPU 0 - Processing Seq 48f9Llhg [4/50]:
GPU 0 - Seq 48f9Llhg - FPS: 26.75. All-Frame FPS: 26.84, All-Seq FPS: 26.90, Max Mem: 2.82G
GPU 0 - Processing Seq 49TNsJzk [5/50]:
GPU 0 - Seq 49TNsJzk - FPS: 26.91. All-Frame FPS: 26.85, All-Seq FPS: 26.90, Max Mem: 2.82G
GPU 0 - Processing Seq 7BcOR5aJ [6/50]:
GPU 0 - Seq 7BcOR5aJ - FPS: 26.73. All-Frame FPS: 26.81, All-Seq FPS: 26.87, Max Mem: 3.91G
GPU 0 - Processing Seq 7K7WVzGG [7/50]:
GPU 0 - Seq 7K7WVzGG - FPS: 26.88. All-Frame FPS: 26.82, All-Seq FPS: 26.88, Max Mem: 3.91G
GPU 0 - Processing Seq 8MfWMkrt [8/50]:
GPU 0 - Seq 8MfWMkrt - FPS: 27.10. All-Frame FPS: 26.84, All-Seq FPS: 26.90, Max Mem: 3.91G
GPU 0 - Processing Seq 8lxxCA5h [9/50]:
GPU 0 - Seq 8lxxCA5h - FPS: 26.86. All-Frame FPS: 26.84, All-Seq FPS: 26.90, Max Mem: 3.91G
GPU 0 - Processing Seq 9mBuSvT2 [10/50]:
GPU 0 - Seq 9mBuSvT2 - FPS: 27.24. All-Frame FPS: 26.86, All-Seq FPS: 26.93, Max Mem: 3.91G
GPU 0 - Processing Seq D4AgqLQL [11/50]:
GPU 0 - Seq D4AgqLQL - FPS: 26.79. All-Frame FPS: 26.86, All-Seq FPS: 26.92, Max Mem: 3.91G
GPU 0 - Processing Seq EWCZAcdt [12/50]:
GPU 0 - Seq EWCZAcdt - FPS: 26.80. All-Frame FPS: 26.85, All-Seq FPS: 26.91, Max Mem: 3.91G
GPU 0 - Processing Seq FFMl5yqs [13/50]:
GPU 0 - Seq FFMl5yqs - FPS: 26.96. All-Frame FPS: 26.85, All-Seq FPS: 26.91, Max Mem: 3.91G
GPU 0 - Processing Seq FiRTBMg2 [14/50]:
GPU 0 - Seq FiRTBMg2 - FPS: 27.46. All-Frame FPS: 26.87, All-Seq FPS: 26.95, Max Mem: 3.91G
GPU 0 - Processing Seq Gy1gwYZD [15/50]:
GPU 0 - Seq Gy1gwYZD - FPS: 27.18. All-Frame FPS: 26.88, All-Seq FPS: 26.97, Max Mem: 3.91G
GPU 0 - Processing Seq HNrCxhwd [16/50]:
GPU 0 - Seq HNrCxhwd - FPS: 26.78. All-Frame FPS: 26.87, All-Seq FPS: 26.95, Max Mem: 3.91G
GPU 0 - Processing Seq JGG6MrhF [17/50]:
GPU 0 - Seq JGG6MrhF - FPS: 26.80. All-Frame FPS: 26.87, All-Seq FPS: 26.95, Max Mem: 3.91G
GPU 0 - Processing Seq K3OUeINk [18/50]:
GPU 0 - Seq K3OUeINk - FPS: 26.95. All-Frame FPS: 26.87, All-Seq FPS: 26.95, Max Mem: 3.91G
GPU 0 - Processing Seq KPDIQo5u [19/50]:
GPU 0 - Seq KPDIQo5u - FPS: 26.84. All-Frame FPS: 26.87, All-Seq FPS: 26.94, Max Mem: 3.91G
GPU 0 - Processing Seq KfcCU1ma [20/50]:
GPU 0 - Seq KfcCU1ma - FPS: 26.89. All-Frame FPS: 26.87, All-Seq FPS: 26.94, Max Mem: 3.91G
GPU 0 - Processing Seq MKnlVo6x [21/50]:
GPU 0 - Seq MKnlVo6x - FPS: 26.94. All-Frame FPS: 26.87, All-Seq FPS: 26.94, Max Mem: 3.91G
GPU 0 - Processing Seq N6CONZUW [22/50]:
GPU 0 - Seq N6CONZUW - FPS: 26.78. All-Frame FPS: 26.87, All-Seq FPS: 26.93, Max Mem: 3.91G
GPU 0 - Processing Seq NFbsxmYE [23/50]:
GPU 0 - Seq NFbsxmYE - FPS: 26.90. All-Frame FPS: 26.87, All-Seq FPS: 26.93, Max Mem: 3.91G
GPU 0 - Processing Seq Q3kk9fuH [24/50]:
GPU 0 - Seq Q3kk9fuH - FPS: 26.82. All-Frame FPS: 26.86, All-Seq FPS: 26.92, Max Mem: 3.91G
GPU 0 - Processing Seq ScFTYisJ [25/50]:
GPU 0 - Seq ScFTYisJ - FPS: 27.03. All-Frame FPS: 26.87, All-Seq FPS: 26.93, Max Mem: 3.91G
GPU 0 - Processing Seq Vh9NwLSn [26/50]:
GPU 0 - Seq Vh9NwLSn - FPS: 26.91. All-Frame FPS: 26.87, All-Seq FPS: 26.93, Max Mem: 3.91G
GPU 0 - Processing Seq VhwRKgVS [27/50]:
GPU 0 - Seq VhwRKgVS - FPS: 26.78. All-Frame FPS: 26.86, All-Seq FPS: 26.92, Max Mem: 4.05G
GPU 0 - Processing Seq X5U0z8VI [28/50]:
GPU 0 - Seq X5U0z8VI - FPS: 27.15. All-Frame FPS: 26.87, All-Seq FPS: 26.93, Max Mem: 4.05G
GPU 0 - Processing Seq aFytsETk [29/50]:
GPU 0 - Seq aFytsETk - FPS: 27.37. All-Frame FPS: 26.88, All-Seq FPS: 26.95, Max Mem: 4.05G
GPU 0 - Processing Seq aT6JIUVU [30/50]:
GPU 0 - Seq aT6JIUVU - FPS: 26.85. All-Frame FPS: 26.88, All-Seq FPS: 26.94, Max Mem: 4.05G
GPU 0 - Processing Seq bl6VuRYE [31/50]:
GPU 0 - Seq bl6VuRYE - FPS: 27.08. All-Frame FPS: 26.88, All-Seq FPS: 26.95, Max Mem: 4.05G
GPU 0 - Processing Seq cUD1dwuP [32/50]:
GPU 0 - Seq cUD1dwuP - FPS: 26.91. All-Frame FPS: 26.88, All-Seq FPS: 26.95, Max Mem: 4.05G
GPU 0 - Processing Seq cjD5WPSv [33/50]:
GPU 0 - Seq cjD5WPSv - FPS: 27.32. All-Frame FPS: 26.89, All-Seq FPS: 26.96, Max Mem: 4.05G
GPU 0 - Processing Seq d83wYdy0 [34/50]:
GPU 0 - Seq d83wYdy0 - FPS: 27.13. All-Frame FPS: 26.89, All-Seq FPS: 26.96, Max Mem: 4.05G
GPU 0 - Processing Seq dtHbJvYy [35/50]:
GPU 0 - Seq dtHbJvYy - FPS: 26.77. All-Frame FPS: 26.89, All-Seq FPS: 26.96, Max Mem: 4.05G
GPU 0 - Processing Seq f4DjwV55 [36/50]:
GPU 0 - Seq f4DjwV55 - FPS: 27.22. All-Frame FPS: 26.89, All-Seq FPS: 26.96, Max Mem: 4.05G
GPU 0 - Processing Seq gdqCcvs2 [37/50]:
GPU 0 - Seq gdqCcvs2 - FPS: 27.05. All-Frame FPS: 26.89, All-Seq FPS: 26.97, Max Mem: 4.05G
GPU 0 - Processing Seq ikcMMycg [38/50]:
GPU 0 - Seq ikcMMycg - FPS: 26.81. All-Frame FPS: 26.89, All-Seq FPS: 26.96, Max Mem: 4.05G
GPU 0 - Processing Seq kozmQMck [39/50]:
GPU 0 - Seq kozmQMck - FPS: 26.79. All-Frame FPS: 26.89, All-Seq FPS: 26.96, Max Mem: 4.05G
GPU 0 - Processing Seq nfcT3owb [40/50]:
GPU 0 - Seq nfcT3owb - FPS: 26.73. All-Frame FPS: 26.88, All-Seq FPS: 26.95, Max Mem: 4.05G
GPU 0 - Processing Seq pMntJwSQ [41/50]:
GPU 0 - Seq pMntJwSQ - FPS: 26.95. All-Frame FPS: 26.89, All-Seq FPS: 26.95, Max Mem: 4.05G
GPU 0 - Processing Seq q1MSEBkh [42/50]:
GPU 0 - Seq q1MSEBkh - FPS: 27.09. All-Frame FPS: 26.89, All-Seq FPS: 26.95, Max Mem: 4.05G
GPU 0 - Processing Seq rUaDdVmD [43/50]:
GPU 0 - Seq rUaDdVmD - FPS: 27.14. All-Frame FPS: 26.89, All-Seq FPS: 26.96, Max Mem: 4.05G
GPU 0 - Processing Seq raql9H7f [44/50]:
GPU 0 - Seq raql9H7f - FPS: 27.00. All-Frame FPS: 26.89, All-Seq FPS: 26.96, Max Mem: 4.05G
GPU 0 - Processing Seq v3uNUctx [45/50]:
GPU 0 - Seq v3uNUctx - FPS: 26.79. All-Frame FPS: 26.89, All-Seq FPS: 26.96, Max Mem: 4.05G
GPU 0 - Processing Seq vJ8W2TO5 [46/50]:
GPU 0 - Seq vJ8W2TO5 - FPS: 27.03. All-Frame FPS: 26.89, All-Seq FPS: 26.96, Max Mem: 4.05G
GPU 0 - Processing Seq vjG0jbkQ [47/50]:
GPU 0 - Seq vjG0jbkQ - FPS: 26.85. All-Frame FPS: 26.89, All-Seq FPS: 26.96, Max Mem: 4.05G
GPU 0 - Processing Seq x3nD3QQ9 [48/50]:
GPU 0 - Seq x3nD3QQ9 - FPS: 26.91. All-Frame FPS: 26.89, All-Seq FPS: 26.95, Max Mem: 4.05G
GPU 0 - Processing Seq xpI7xRWN [49/50]:
GPU 0 - Seq xpI7xRWN - FPS: 27.44. All-Frame FPS: 26.90, All-Seq FPS: 26.96, Max Mem: 4.05G
GPU 0 - Processing Seq yExgitit [50/50]:
GPU 0 - Seq yExgitit - FPS: 27.38. All-Frame FPS: 26.90, All-Seq FPS: 26.97, Max Mem: 4.05G
GPU 0 - All-Frame FPS: 26.90, All-Seq FPS: 26.97, Max Mem: 4.05G
Hello, because the server which I used to conduct the experiments in iccv has been taken back, I can not reproduce the same environment. I have ran the experiments of aotl on another server. I found that the memory usage difference may comes from the aot versions. I reran the codes I used in iccv, and I can reproduce the memory usage, and the code version is this. And I inserted the released codes into current aot codes, the memory usage is similar to your log. I found that there are some differences in the model codes between the two versions.
About the performance difference, I think this may result from the aot version difference too. Besides, the result in iccv is a little different from that we released, because in iccv we did not split the results of seen and unseen categories. In our released version, our evaluation tools separates the seen and unseen categories. But the performance gap resulting from the seen and unseen categories setting is less than 0.5% in our experiments. In a nut shell, we think the performance difference mainly comes from the different aot version.
I'm sry that we did not find the aot version will result in this problem. We also reran the XMem experiments, and the memory usage is similar to the paper.
BTW, we will release LVOS v2 with more videos, and update the experiment results by using the latest codes to address performance inconsistencies.
I will close it for now. Feel free to open it in case of future questions!
Hi @LingyiHongfd ,
Thank you for sharing your work. I had a question but unfortunately, I was not able to meet you in person at ICCV.
The question is how do you generate the prediction for AOT-T? You released the evaluation code, but releasing the prediction code is important as well. Could you please point me out to this script or release please? Also, any plan regarding releasing the code of DMemory?
Best regards, Abdelrahman.