IDEA-Research / Grounding-DINO-1.5-API

Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
https://arxiv.org/abs/2405.10300
Apache License 2.0
808 stars 26 forks source link

good job! #2

Open xiyangyang99 opened 6 months ago

xiyangyang99 commented 6 months ago

看了一下paper,GD1.5版本的tensorrt推理只有fp32的,是因为有的算子还不支持fp16的吧?

Mountchicken commented 6 months ago

Hi @xiyangyang99 Thanks for your interest in our work. Grounding DINO 1.5's TensorRT supports FP16 inference, which is faster than FP32 but comes with a slight performance decrease. We are currently optimizing this issue.

xiyangyang99 commented 6 months ago

I just conducted zero shot testing on a local dataset and it was an excellent job. I just wanted to ask, can you provide a link to your ground data dataset? I hope to fine tune the text and images in my own dataset in the future.

------------------ 原始邮件 ------------------ 发件人: "IDEA-Research/Grounding-DINO-1.5-API" @.>; 发送时间: 2024年5月17日(星期五) 下午3:56 @.>; @.**@.>; 主题: Re: [IDEA-Research/Grounding-DINO-1.5-API] good job! (Issue #2)

Hi @xiyangyang99 Thanks for your interest in our work. Grounding DINO 1.5's TensorRT supports FP16 inference, which is faster than FP32 but comes with a slight performance decrease. We are currently optimizing this issue.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Mountchicken commented 6 months ago

Sorry but this dataset is internal and we are unable to provide a link to it. Appreciate your understanding :)

xiyangyang99 commented 6 months ago

Sorry but this dataset is internal and we are unable to provide a link to it. Appreciate your understanding :)

Hello, I would like to ask you a question, such as how to fine tune the text section for language models like BERT or CLIP? Because some professional descriptive texts do not include native clip and BERT pre training data packets.

Mountchicken commented 6 months ago

We train the BERT along with the whole model directly.

Baboom-l commented 6 months ago

Can Grounding DINO 1.5's TensorRT supports multi batch inference? Can both Pro and Edge support TensorRT?

xiyangyang99 commented 6 months ago

大兄弟,要不要复现一下groundingdino 1.5?

------------------ 原始邮件 ------------------ 发件人: "IDEA-Research/Grounding-DINO-1.5-API" @.>; 发送时间: 2024年5月20日(星期一) 中午1:01 @.>; @.**@.>; 主题: Re: [IDEA-Research/Grounding-DINO-1.5-API] good job! (Issue #2)

Can Grounding DINO 1.5's TensorRT supports multi batch inference? Can both Pro and Edge support TensorRT?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Baboom-l commented 6 months ago

你有资源吗?我训过SwinL+4M数据,8卡A100花了35天。复现GD 1.5 ,不算数据问题,我预估起步64~128张A100吧。而且gd1.5关键就是数据,不可能开源你咋复现。模型结构上又没有啥改动

xiyangyang99 commented 6 months ago

我们也是第一时间对GD1.5进行了分析,zero-shot能力确实很优秀,现在就是在边缘端的推理速度上不及yolo-world,其他的都优于yolo-world的表现,我问过作者了,数据集是内部的,他们不打算公开。

------------------ 原始邮件 ------------------ 发件人: "IDEA-Research/Grounding-DINO-1.5-API" @.>; 发送时间: 2024年5月20日(星期一) 中午1:38 @.>; @.**@.>; 主题: Re: [IDEA-Research/Grounding-DINO-1.5-API] good job! (Issue #2)

你有资源吗?我训过SwinL+4M数据,8卡A100花了35天。复现GD 1.5 ,不算数据问题,我预估起步64~128张A100吧。而且gd1.5关键就是数据,不可能开源你咋复现。模型结构上又没有啥改动

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Baboom-l commented 6 months ago

edge那个改动我看着像是RT DETR的encoder,你要是需要复现edge可以考虑这么改

xiyangyang99 commented 6 months ago

我们现在以yolo-world作为base code,添加图像文本对齐模块。但是效果不太好。还有就是GD用的bert 作为文本编码器,要比clip好得多泛化性。

------------------ 原始邮件 ------------------ 发件人: "IDEA-Research/Grounding-DINO-1.5-API" @.>; 发送时间: 2024年5月20日(星期一) 中午1:44 @.>; @.**@.>; 主题: Re: [IDEA-Research/Grounding-DINO-1.5-API] good job! (Issue #2)

edge那个改动我看着像是RT DETR的encoder,你要是需要复现edge可以考虑这么改

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Baboom-l commented 6 months ago

泛化性是预训练数据来的,CLIP做文本编码器比bert泛化性肯定更好的,多半是你得把数据集数量太少,vision部分没学好

Baboom-l commented 6 months ago

@Mountchicken 我有点好奇你们测试ODinW时的Prompt是什么,这个能公布出来吗

xiyangyang99 commented 6 months ago

是的,本地数据集不多,也就10个类,3w张数据。

------------------ 原始邮件 ------------------ 发件人: "IDEA-Research/Grounding-DINO-1.5-API" @.>; 发送时间: 2024年5月20日(星期一) 中午1:54 @.>; @.**@.>; 主题: Re: [IDEA-Research/Grounding-DINO-1.5-API] good job! (Issue #2)

泛化性是预训练数据来的,CLIP做文本编码器比bert泛化性肯定更好的,多半是你得把数据集数量太少,vision部分没学好

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

xiyangyang99 commented 6 months ago

那如果现在我们想获得不亚于edge的zero-shot的话,数据这边是不是需要考虑融合其他的数据集?类似于GD团队提出的grounding-data?

------------------ 原始邮件 ------------------ 发件人: "IDEA-Research/Grounding-DINO-1.5-API" @.>; 发送时间: 2024年5月20日(星期一) 中午1:54 @.>; @.**@.>; 主题: Re: [IDEA-Research/Grounding-DINO-1.5-API] good job! (Issue #2)

泛化性是预训练数据来的,CLIP做文本编码器比bert泛化性肯定更好的,多半是你得把数据集数量太少,vision部分没学好

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

Baboom-l commented 6 months ago

至少把o365和openimage数据集加进去

yutao007 commented 6 months ago

大兄弟,你们有微信群吗?这个开集检测方向还是挺不错的,小弟想跟着大神学学。

77h2l commented 6 months ago

在垂类领域的效果不佳,请问有在特定数据集上微调的入口吗,论文好像介绍了可以在下游任务微调,但是只贴了个结果,没有说怎么微调哈?

rentainhe commented 6 months ago

在垂类领域的效果不佳,请问有在特定数据集上微调的入口吗,论文好像介绍了可以在下游任务微调,但是只贴了个结果,没有说怎么微调哈?

Thanks for your attention to our work, if your dataset is not big, you can try our optimized visual prompt: https://www.deepdataspace.com/playground/ovp here, where you can upload your dataset (in coco format) and we will fine-tune a specific model for you.

If your dataset is a little bit large, which need to fine-tune the whole model for better results, please contact us by email, here is my email address: rentianhe@idea.edu.cn

xiyangyang99 commented 6 months ago

May I ask if it is convenient to disclose how many types are included in the grounding-20M dataset and how many samples are available for each type? Is it just filtering out some high-quality data from the publicly available dataset? Is grounding-20M a long tailed dataset? Looking forward to your reply.

------------------ 原始邮件 ------------------ 发件人: "IDEA-Research/Grounding-DINO-1.5-API" @.>; 发送时间: 2024年5月17日(星期五) 下午4:37 @.>; @.**@.>; 主题: Re: [IDEA-Research/Grounding-DINO-1.5-API] good job! (Issue #2)

Sorry but this dataset is internal and we are unable to provide a link to it. Appreciate your understanding :)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

xiyangyang99 commented 6 months ago

技术报告里面有提到gronding-20M的数据集,大佬可否分析一下是否是长尾数据?还是仅仅是把公开的数据集重新整理过?

------------------ 原始邮件 ------------------ 发件人: "IDEA-Research/Grounding-DINO-1.5-API" @.>; 发送时间: 2024年5月20日(星期一) 下午2:01 @.>; @.**@.>; 主题: Re: [IDEA-Research/Grounding-DINO-1.5-API] good job! (Issue #2)

至少把o365和openimage数据集加进去

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>

mengyu212 commented 5 months ago

在垂类领域的效果不佳,请问有在特定数据集上微调的入口吗,论文好像介绍了可以在下游任务微调,但是只贴了个结果,没有说怎么微调哈?

Thanks for your attention to our work, if your dataset is not big, you can try our optimized visual prompt: https://www.deepdataspace.com/playground/ovp here, where you can upload your dataset (in coco format) and we will fine-tune a specific model for you.

If your dataset is a little bit large, which need to fine-tune the whole model for better results, please contact us by email, here is my email address: rentianhe@idea.edu.cn

Thank you for your great job! I have tried your "optimized visual prompt" method provided in the above link, which made excellent performance. However, I would like to know that this can only be tested on the web side? How do I export the model or results of the detection box? Looking forward to your reply.

rentainhe commented 5 months ago

在垂类领域的效果不佳,请问有在特定数据集上微调的入口吗,论文好像介绍了可以在下游任务微调,但是只贴了个结果,没有说怎么微调哈?

Thanks for your attention to our work, if your dataset is not big, you can try our optimized visual prompt: https://www.deepdataspace.com/playground/ovp here, where you can upload your dataset (in coco format) and we will fine-tune a specific model for you. If your dataset is a little bit large, which need to fine-tune the whole model for better results, please contact us by email, here is my email address: rentianhe@idea.edu.cn

Thank you for your great job! I have tried your "optimized visual prompt" method provided in the above link, which made excellent performance. However, I would like to know that this can only be tested on the web side? How do I export the model or results of the detection box? Looking forward to your reply.

We will consider supporting the export of embeddings generated by the 'optimized visual prompt' for direct use with the Grounding DINO API. This will require some time to extend the current API to support this functionality. If your need is urgent, you can email us directly, here is the email address: rentianhe@idea.edu.cn, and we will specifically follow up on your request. @mengyu212

jetyingjia commented 3 months ago

We train the BERT along with the whole model directly.

@Mountchicken

Hi,about text encoder, in Trex-2 using CLIP_text;in GDINO1.5 using BERT; Using CLIP_text or BERT will make a difference in performance? Only consider performance, which one is better in your experiments?

jetyingjia commented 2 months ago

@rentainhe

Hi,about text encoder, in Trex-2 using CLIP_text;in GDINO1.5 using BERT; Using CLIP_text or BERT will make a difference in performance? Only consider performance, which one is better in your experiments?