reproduce evaluation results

waterljwant commented 3 months ago

Hi, Thank you for the great open-source work.

However, I am currently facing some difficulties in reproducing the evaluation results, particularly regarding the scene classification on NYU-D and SUN-D. I have attached the results I obtained after executing the provided script. Could you please assist me in identifying any possible steps or details that I might have missed, leading to this inconsistency in accuracy?

StanLei52 commented 3 months ago

Hi,

To reproduce results on NYU-D and SUN-D,

Please follow the instruction for inference: download vitlensL-depth checkpoint.

cd vitlens/
# you may change the path accordingly
torchrun --nproc_per_node=1 ./src/training/depth/depth_tri_main.py \
  --cache_dir /path_to/cache \
  --val-data sun-rgbd::nyu-depth-v2-val1::nyu-depth-v2-val2 \
  --visual_modality_type depth --dataset-type depth --v_key depth \
  --n_tower 3 \
  --use_perceiver  --perceiver_cross_dim_head 64 --perceiver_latent_dim 1024 --perceiver_latent_dim_head 64 --perceiver_latent_heads 16 \
  --perceiver_num_latents 256 --perceiver_as_identity \
  --use_visual_adapter \
  --batch-size 64 \
  --lock-image --lock-text --lock-visual --unlock-trans-first-n-layers 4 \
  --model ViT-L-14 --pretrained datacomp_xl_s13b_b90k \
  --name depth/inference_vitlensL_perf \
  --resume /path_to/vitlensL_depth.pt

We follow ImageBind for data preprocessing (convert to disparity), please also make sure you use the same operation. See here. I also uploaded a copy here.

If you still cannot reproduce the results (Table 5 in the paper), you may provide your env setup so that i can look into this.

StanLei52 commented 3 months ago

Btw, results from my side following the installation setup (pytorch==1.11.0), for your reference.

waterljwant commented 3 months ago

@StanLei52 Thank you! I have found that I mistakenly used different depth data. After adjusting according to this code depth_dir = os.path.join(path, "depth_bfx"), the accuracy is consistent.

TencentARC / ViT-Lens

reproduce evaluation results #15