X2FD / LVIS-INSTRUCT4V

MIT License
131 stars 0 forks source link

How could you pass 100RPD on such scale? #7

Closed Etelis closed 1 year ago

Etelis commented 1 year ago

This paper does not add up. Please explain the following:

  1. How could you get pass the 100RPD? 220K instructions mean about 314 accounts (220,000 / (100 * 7)) – that's a lot of SIMs and sounds like a breach of OpenAI's rules. 🤔 Got any proof of actually doing this? and let's be honest no way you started the day the API came out without doing some testing with so many accounts.. so I assume a feasible solution would be having approx. 500 accounts)

  2. Each account maybe has $5 credit but GPT4V (as GPT4) requires an investment of $5. so you invested at least $2500? You invested 2.5k only for the DB? can we see some proof of that?

  3. Each prompt would cost you about $0.025 ( ~800 input (with img), 500 output) so you paid ~$6500?

So including training you paid approx $7k?

I'm sorry for being so harsh and maybe I'm not aware of something here.. maybe you got higher limits from OpenAI themselves.. but this seems odd as you got no connection to them.

Please explain as this would be super annoying to find out you used LLAVA instead of GPT4V.

wdrink commented 1 year ago

We indeed adopted GPT-4V to generate the visual instruction following data. BTW, all the constructive comments are welcome but your words are not reasonable.

Etelis commented 1 year ago

How are my words not reasonable? Please explain what question wasn't reasonable? I will further elaborate on it. Again if I'm incorrect here I will take everything back and apologies but something isn't adding up here. @wangjk666 wangjk666

gyupro commented 1 year ago

I believe the output from this model isn't as comprehensive as what I would expect from GPT-4. It seems to be missing intricacies, and I'm seeking a more accurate response.

gyupro commented 1 year ago

@Etelis, please take a moment to relax. Is something bothering you? Altering your choice of words isn't harmful!

Etelis commented 1 year ago

Yes I will alter my words, Sorry about that.. It's just upsetting me to see such a promising statement that people will definitely will use when it is highly likely to be made up in some level. @gyupro

wdrink commented 1 year ago

I believe the output from this model isn't as comprehensive as what I would expect from GPT-4. It seems to be missing intricacies, and I'm seeking a more accurate response.

The instructions in our dataset are much more detailed compared to those from LLaVa-instruct. We do agree even with our dataset, the models trained by us are still far behind GPT-4V. Replicating GPT-4V is not our intention, instead we hope our dataset could benefit the research community to explore better visual instruction tuning.

Etelis commented 1 year ago

Hey, I have noticed in the article you used GPT4V inside the API call, but as far as I'm aware there is no model called GPT-4V.. The call should have been to 'gpt-4-vision-preview' another thing that does not add up, Could you elaborate on this? image

wdrink commented 12 months ago

It is called “pseudo code” in the paper.

gyupro commented 12 months ago

@wangjk666 Hello. I have been following the issuse that @Etelis raises. He used a bit offensive word but pointed out the fact that GPT4 api is limited. I think you better elaborate the way how you could achieve to generate 220K dataset. Did you make a lot of accounts and paid a lot of moeny to do so? This is not mentioned in the paper, I think you better explain to avoid the further misunderstaings. @wangjk666

wdrink commented 12 months ago

We wanted to make a clarification here for those who are interested in the data generation of LVIS-INSTRUCT4V: we used multiple accounts and spent some money. The goal of our work is to advance the visual instruction tuning with more fine-graind instruction-following data.