about table 3 - Githubissues

iris0329 commented 9 months ago

Hi, thanks for providing this work.

I found there is no discussion for table 3 in the paper.
I found the acc@0.25 on the ScanRefer dataset is 35.9, but I didn't find out this number on the ScanRefer official site.
I just wonder why the results seem to not be compared with the ScanRefer baseline method.

Did I miss anything?

ZzZZCHS commented 9 months ago

Table 3 aims to show that our method reaches SOTA among 3D LLM methods.
We haven't uploaded the results to the official site, but we have released the entire training and evaluation pipeline (including the checkpoints) in this repository. You can try it out to check the performance.
I think the main reason is the lack of data for 3D-LLM alignment. For the success of 2D-LLMs, they utilized a large amount of training data for alignment. Also, the intricate nature of 3D scenes necessitates a more tailored design to learn spatial relationships effectively. In this work, we use a simple alignment architecture and only use data sourced from scannet, which is far from enough to train a robust 3D LLM. Future works are needed to solve these problems.

iris0329 commented 9 months ago

Ah, I see. Thanks for your detailed reply and open-source, and I tried the code.

I still would like to know, in the ScanRefer benchmark

but Table 3 shows the different numbers for the same method, e.g. ScanRefer

I am kind of confused in reading the table as to why this difference will happen

Would you share with me how to match the two tables?

ZzZZCHS commented 9 months ago

The ScanRefer Benchmark is based on ScanRefer's test set, while the results in our Table 3 is on validation set.

We use the results from ViL3DRel's Table 8. You can also find the same results (37.3/24.3) in InstanceRefer (Table 1) and MVT (Table 3).

ZzZZCHS / Chat-Scene