Cannot reproduce the performance of EmbCLIP on the ObjectNav task of the RoboTHOR

hutchinsonian commented 6 months ago

Problem / Question

Official EmbCLIP is on the leaderboard with a TestSPL score of 0.2. But according to the tutorial, I got a score of 0.17. The command I ran was:

python allenact/main.py -o storage/objectnav-robothor-rgb-clip-rn50 -b projects/objectnav_baselines/experiments/robothor/clip objectnav_robothor_rgb_clipresnet50gru_ddppo -c pretrained_model_ckpts/objectnav-robothor-clip-rn50.130M.pt

And I got the file storage/objectnav-robothor-rgb-clip-rn50/metrics/ObjectNav-RoboTHOR-RGB-ClipResNet50GRU-DDPPO/2024-02-15_03-32-04/metrics__test_2024-02-15_03-32-04.json According to the method in the documentation, I got the metric of the validation. According to the scripts form robothor-challenge, I run the command:

python convert_allenact_metrics.py -v storage/objectnav-robothor-rgb-clip-rn50/metrics/ObjectNav-RoboTHOR-RGB-ClipResNet50GRU-DDPPO/2024-02-15_02-36-50/metrics__val_2024-02-15_02-36-50.json -t storage/objectnav-robothor-rgb-clip-rn50/metrics/ObjectNav-RoboTHOR-RGB-ClipResNet50GRU-DDPPO/2024-02-15_03-32-04/metrics__test_2024-02-15_03-32-04.json -o submission_metrics.json.gz

Then submit the submission_metrics.json.gz file,

What should I do?

Additional context

(Optional) - To provide support it's helpful to have as many details as possible, add additional context here.

Lucaweihs commented 6 months ago

Hi @hutchinsonian, it's been quite a while since I looked at these details... Can you tell me what commit ID AI2-THOR starts when you're running the evaluation (you can find this by running ps aux | grep thor when running evaluation)? There have been some changes between AI2-THOR versions would could cause some differences in metrics,

hutchinsonian commented 6 months ago

Hi @hutchinsonian, it's been quite a while since I looked at these details... Can you tell me what commit ID AI2-THOR starts when you're running the evaluation (you can find this by running ps aux | grep thor when running evaluation)? There have been some changes between AI2-THOR versions would could cause some differences in metrics,

I got this:

xxx+ 3414782 29.8  0.1 8108624 761272 pts/13 Sl+  02:06   0:41 /home/xxx/.ai2thor/releases/thor-Linux64-bad5bc2b250615cb766ffb45d455c211329af17e/thor-Linux64-bad5bc2b250615cb766ffb45d455c211329af17e -screen-fullscreen 0 -screen-quality 7 -screen-width 400 -screen-height 300

And I eval this on the headless machine line, I have run sudo python scripts/startx.py & to start an x-display, the code looks to run without any problems(score of 0.17). Is there any reason why the performance of the model decreases when running commit id=bad5bc2b250615cb766ffb45d455c211329af17e version of AI2-THOR on a headless machine?

Do I need to follow the second point mentioned in the link and set the commit id to 91139c909576f3bf95a187c5b02c6fd455d06b48? When I change the commit id to 91139c909576f3bf95a187c5b02c6fd455d06b48 and set THOR_IS_HEADLESS = True, I got this error:

ValueError: Invalid commit_id: 91139c909576f3bf95a187c5b02c6fd455d06b48 - no build exists for arch=Linux platforms=Linux64

allenai / allenact

Cannot reproduce the performance of EmbCLIP on the ObjectNav task of the RoboTHOR #377

Problem / Question

Additional context