OpenGVLab / VisionLLM

VisionLLM Series
https://arxiv.org/abs/2305.11175
Apache License 2.0
940 stars 29 forks source link

[REQUEST] Code and models please! #2

Open spacewalkingninja opened 1 year ago

spacewalkingninja commented 1 year ago

Hello! I am urgently asking for the release of the inference code + model. Training would be good too. Incredibly thankful, very interesting project!

mtjhl commented 1 year ago

When will the codes be released?

hjq133 commented 1 year ago

+1. looking forward to the code. intersting project.

wzhings commented 1 year ago

+1. I am looking forward to the codes. It is an awesome work.

wojiaohumaocheng commented 1 year ago

+1

karthikyeredla commented 1 year ago

+1

spacewalkingninja commented 1 year ago

@czczup can you please enlighten us from the realms of the model and code lands <3

mpragnay commented 1 year ago

what training data has been used?? Is it publicly available

autosquid commented 1 year ago

any update?

amygbAI commented 1 year ago

i think all of you guys are wasting your time waiting for this .. check out the original LLaVA paper ..it has code, demo and all you need to get started ..i however, thank the authors of this paper for referencing it and letting us know it exists :)

amygbAI commented 1 year ago

Hi Bruno .. i would say the objectives are 100% the same. So its better to go with a Microsoft research paper that has code rather than some random copy of it ..obviously the authors dont seem to care much anymore

On Thu, Sep 14, 2023 at 4:58 PM Bruno Ma @.***> wrote:

Hi @amygbAI https://github.com/amygbAI, u mean this paper is totally the same with LLaVA?

i think all of you guys are wasting your time waiting for this .. check out the original LLaVA paper ..it has code, demo and all you need to get started ..i however, thank the authors of this paper for referencing it and letting us know it exists :)

— Reply to this email directly, view it on GitHub https://github.com/OpenGVLab/VisionLLM/issues/2#issuecomment-1719272640, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSDEJWY52C3JKN3FZATX2LS7LANCNFSM6AAAAAAYKZB7RI . You are receiving this because you were mentioned.Message ID: @.***>

GuangxingHan commented 1 year ago

Hi Bruno .. i would say the objectives are 100% the same. So its better to go with a Microsoft research paper that has code rather than some random copy of it ..obviously the authors dont seem to care much anymore On Thu, Sep 14, 2023 at 4:58 PM Bruno Ma @.> wrote: Hi @amygbAI https://github.com/amygbAI, u mean this paper is totally the same with LLaVA? i think all of you guys are wasting your time waiting for this .. check out the original LLaVA paper ..it has code, demo and all you need to get started ..i however, thank the authors of this paper for referencing it and letting us know it exists :) — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSDEJWY52C3JKN3FZATX2LS7LANCNFSM6AAAAAAYKZB7RI . You are receiving this because you were mentioned.Message ID: @.>

Hi, May I know if you can reproduce the results? Do you mean this work uses the same training objective as LLaVA? Thanks.

amygbAI commented 1 year ago

Hi Mr Han .. no, i havent reproduced the results because i would like to train this on charts / graph data exclusively and im preparing the datasets ..having said that , i believe the training objective is the same , i.e.

if you go through the LLaVA paper, this will be amply evident to you

On Tue, Sep 19, 2023 at 11:25 AM Guangxing Han @.***> wrote:

Hi Bruno .. i would say the objectives are 100% the same. So its better to go with a Microsoft research paper that has code rather than some random copy of it ..obviously the authors dont seem to care much anymore … <#m-2675669682656386325> On Thu, Sep 14, 2023 at 4:58 PM Bruno Ma @.> wrote: Hi @amygbAI https://github.com/amygbAI https://github.com/amygbAI https://github.com/amygbAI, u mean this paper is totally the same with LLaVA? i think all of you guys are wasting your time waiting for this .. check out the original LLaVA paper ..it has code, demo and all you need to get started ..i however, thank the authors of this paper for referencing it and letting us know it exists :) — Reply to this email directly, view it on GitHub <#2 (comment) https://github.com/OpenGVLab/VisionLLM/issues/2#issuecomment-1719272640>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSDEJWY52C3JKN3FZATX2LS7LANCNFSM6AAAAAAYKZB7RI https://github.com/notifications/unsubscribe-auth/ATIQOSDEJWY52C3JKN3FZATX2LS7LANCNFSM6AAAAAAYKZB7RI . You are receiving this because you were mentioned.Message ID: @.>

Hi, May I know if you can reproduce the results? Do you mean this work uses the same training objective as LLaVA? Thanks.

— Reply to this email directly, view it on GitHub https://github.com/OpenGVLab/VisionLLM/issues/2#issuecomment-1724872503, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSAK2N7RFUJN2OK6773X3EXWJANCNFSM6AAAAAAYKZB7RI . You are receiving this because you were mentioned.Message ID: @.***>

GuangxingHan commented 1 year ago

Hi Mr Han .. no, i havent reproduced the results because i would like to train this on charts / graph data exclusively and im preparing the datasets ..having said that , i believe the training objective is the same , i.e. - ensure that the model can take both images and text as input - perform analysis over both image + textual contexts - provide results of the query in textual format if you go through the LLaVA paper, this will be amply evident to you On Tue, Sep 19, 2023 at 11:25 AM Guangxing Han @.> wrote: Hi Bruno .. i would say the objectives are 100% the same. So its better to go with a Microsoft research paper that has code rather than some random copy of it ..obviously the authors dont seem to care much anymore … <#m-2675669682656386325> On Thu, Sep 14, 2023 at 4:58 PM Bruno Ma @.> wrote: Hi @amygbAI https://github.com/amygbAI https://github.com/amygbAI https://github.com/amygbAI, u mean this paper is totally the same with LLaVA? i think all of you guys are wasting your time waiting for this .. check out the original LLaVA paper ..it has code, demo and all you need to get started ..i however, thank the authors of this paper for referencing it and letting us know it exists :) — Reply to this email directly, view it on GitHub <#2 (comment) <#2 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSDEJWY52C3JKN3FZATX2LS7LANCNFSM6AAAAAAYKZB7RI https://github.com/notifications/unsubscribe-auth/ATIQOSDEJWY52C3JKN3FZATX2LS7LANCNFSM6AAAAAAYKZB7RI . You are receiving this because you were mentioned.Message ID: @.> Hi, May I know if you can reproduce the results? Do you mean this work uses the same training objective as LLaVA? Thanks. — Reply to this email directly, view it on GitHub <#2 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATIQOSAK2N7RFUJN2OK6773X3EXWJANCNFSM6AAAAAAYKZB7RI . You are receiving this because you were mentioned.Message ID: @.>

Thanks for your reply. Yes, LLaVA works exactly in this way.

becauseofAI commented 1 year ago

@czczup Can you provide a timeline for the release code? Thx!

shaniaos commented 10 months ago

I see that this paper is accepted by NeurIPS 2023, which is held 1 month ago. It's Jan 2024 now. Is the code going to be released?

zzchust commented 10 months ago

+1

annopackage commented 9 months ago

waiting for code release.

spacewalkingninja commented 9 months ago

this is a direct message from the intergalactic open source allegiance: RELEASE THIS MODEL TODAY

On Fri, 1 Mar 2024 at 06:10, annopackage @.***> wrote:

waiting for code release.

— Reply to this email directly, view it on GitHub https://github.com/OpenGVLab/VisionLLM/issues/2#issuecomment-1972573952, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGOEEH5E66AQZKIDJJX25ADYWALVNAVCNFSM6AAAAAAYKZB7RKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZSGU3TGOJVGI . You are receiving this because you authored the thread.Message ID: @.***>

Haiyang-W commented 6 months ago

If needed, everyone may try the GiT repository, a general end-to-end vision transformer, which fully covers the tasks included in visionLLM and can also handle semantic segmentation. The code and pre-trained weights have been fully open-sourced.

"GiT: Towards Generalist Vision Transformer through Universal Language Interface"