Closed Xiaolong-RRL closed 9 months ago
Hi @Xiaolong-RRL, sorry for the late reply, just want to confirm you used the checkpoint we provided in the repo, right?
Hi, I use this checkpoint: https://aspis.cmpt.sfu.ca/projects/m3dref-clip/pretrain/M3DRef-CLIP_ScanRefer.ckpt
And I used the multiview features that were processed here, with the size of 36GB, rather than the one you provided directly https://aspis.cmpt.sfu.ca/projects/m3dref-clip/data/enet_feats_maxpool.hdf5 with the size of 100+ GB, I wonder if this will affect the final evaluation result?
Yes, you should use the 100+GB one. We follow the prior work D3Net. The only difference between the 36GB and 100+GB versions is the number of points, the former only samples 50,000 points for each scene, while the second is the unsampled original scene. M3Ref-CLIP uses the 100+GB version and does point sampling in the dataloader.
I see!! but the speed during the download process is very slow. I wonder if it's convenient for you to provide a Baidu Netdisk link, or split it into multiple files and upload them to Google Drive for quick download, thanks!!
Sure, we are seeking an alternative place to put it and also we will release the instructions for regenerating this file.
Thanks for your kindly reply, and I am looking forward to it~
Hi @Xiaolong-RRL. We've updated the README and added instructions for generating the enet_feats_maxpool.hdf5
.
Dear author:
Thanks for your interesting work.
I have completed the entire process of training and inferencing following the README.md, but when I run the follow command with the given ckpt:
I get unsatisfactory performance, far lower than your results in readme.md:
I wander if it's correct? And how to handle it to achieve the same results as the one in readme.md?
Thanks!!