Open3DA / LL3DA

[CVPR 2024] "LL3DA: Visual Interactive Instruction Tuning for Omni-3D Understanding, Reasoning, and Planning"; an interactive Large Language 3D Assistant.
https://ll3da.github.io/
MIT License
199 stars 6 forks source link

New object #15

Open kuaileqipaoshui opened 2 months ago

kuaileqipaoshui commented 2 months ago

Excuse me, I have a question, is the prediction of the box(ov-det) invalid for the objects other than the 17 objects defined? Because I found that objects will be filtered here(captioner.py-line394), if there are new categories, will they all be judged as others, so that it is impossible to predict? ll3da If there's a new object, a fruit, how to predict its position?

ch3cook-fdu commented 2 months ago

The last class of sem_cls_logits is the no object class. Open-vocabulary detection is designed to extend a model’s ability to localize and recognize object beyond a close and pre-defined category set.

kuaileqipaoshui commented 2 months ago

The last class of sem_cls_logits is the no object class. Open-vocabulary detection is designed to extend a model’s ability to localize and recognize object beyond a close and pre-defined category set.

Is it possible to locate objects in a pre-defined set of classes in ov-det so that the description is generated for these objects only? Wouldn't it be possible to locate a new object without generating a related object description? If retrain detection, can increase the object category? Such as adding a fruit category

ch3cook-fdu commented 2 months ago

You can just filter and re-label the categories in the generated texts.

kuaileqipaoshui commented 2 months ago

You can just filter and re-label the categories in the generated texts.

I'm sorry, I didn't understand what you said. For example, if I want to detect the location of a banana in a new scene, can it output the location of a banana like the one in the template? ll3da1 So when I filter, it will think of the bananas as others categories. If I re-label the categories, such as 18:banana, would it be right? change self.num_semcls=19? ll3da2

ch3cook-fdu commented 2 months ago

If you are looking for a grounding model, you can design input text instructions like “locate the banana”.

kuaileqipaoshui commented 2 months ago

If you are looking for a grounding model, you can design input text instructions like “locate the banana”.

The generated answer gives the center of the box and the length, width and height, so how do I visualize the box? 屏幕截图 2024-05-02 152431 How can I reconstructed the 3D box?

ch3cook-fdu commented 2 months ago

Please refer to https://github.com/ch3cook-fdu/3d-pc-box-viz for more visualization functions

kuaileqipaoshui commented 2 months ago

Please refer to https://github.com/ch3cook-fdu/3d-pc-box-viz for more visualization functions

It's good to see the results of your work. Can you explain in detail how to decode 3D box? I try to decode it, but failed. Looking forward to your reply.

ch3cook-fdu commented 2 months ago

The code for decoding box coordinates can be found in https://github.com/Open3DA/LL3DA/blob/main/eval_utils/evaluate_ovdet.py#L163-L201. Please refer to https://github.com/ch3cook-fdu/Vote2Cap-DETR/issues/11 for visualization.