Closed rolfstarke closed 3 weeks ago
Hi,
I believe the issue with the proposed setting is that the vocabulary you presented contains many vague or generic verbs.
For instance, "furniture" and "board" are very generic and could represent many things, making them unfriendly for a 2D detector like ODISE. [For this reason, many generic vocabularies are eliminated in an open-world setting.] I would suggest trying a better selection of vocabulary for testing. Since this scene is from ScanNet, testing it with 17 classes (i.e., removing other furniture) should yield reasonable performance.
Best, Zhening
We are closing this issue for now.
Feel free to check out the newer version of the code, which is optimized to reproduce results and works for zero-shot inference.
Best, Zhening
Dear
thank you for the interesting model! i managed to run the examples of testing, but once i start to run it on my own scenes or change the vocab i get incorrect results. what could be the reasons for this?
this is an example where i changed just the vocab for the scannet example like this:
python zero_shot.py --pcd_path 'demo/demo_scene/scannet/scannet_scene1.ply' --vocab "floor; wall; beam; column; window; door; furniture; board"
thank you for your time