About the instruction tuning data

BAAI-DCAI / Multimodal-Robustness-Benchmark

Apache License 2.0

42 stars 0 forks source link

About the instruction tuning data #2

Open XueJiang16 opened 3 months ago

XueJiang16 commented 3 months ago

This is a great work! I notice that Bunny-MMR use MMR data for instruction tuning and original Bunny use DataOptim for instruction tuning.

I have two questions:

why not use both MMR and DataOptim for instruction tuning? By doing this, could Bunny perform better?
Bunny-MMR performs better than original Bunny on MMR benchmark but how about performance comparison on common benchmark?

LAW1223 commented 3 months ago

Hi, thanks for your attention.

In response to question 1: Since we are trying to compare with llava 158k (famous), our control variable is llava 158k. In fact, we could combine MMR and DataOptim, or I even have other data that could be stacked (data that has proven validity), and the model performance would be better, but for the sake of comparison, we didn't add the training set.

In response to question 2: We have done experimental tests before, and in general, it will increase in other benchmarks, but the increase is not more than 1%, so we think it can't be significantly improved in other general benchmarks.

XueJiang16 commented 3 months ago

Thanks for your reply.

So I wonder how much improvement if you combine MMR and DataOptim?

Also, why use less data (MMR v.s. DataOptim) can achieve similar performance on common benchmark? I think this is a very interesting point.

Moreover, I would like to ask you about how to address the issue of using new data to tune an existing model. Should the old instruction data and the new instruction data be trained together? Or, only use new data?

LAW1223 commented 3 months ago

Let's be clear, what part of the data specifically are you referring to when you say Dataoptim? There's a lot of stuff in there about the performance of MMR data once it's joined and how it's joined, so if you're interested, could you send me an email and we might set up a meeting to discuss it?

XueJiang16 commented 3 months ago

Thanks for your help! Maybe I have some misunderstanding. My email is csxjiang@comp.hkbu.edu.hk.

LAW1223 commented 3 months ago

Okay

LAW1223 commented 3 months ago

Thanks for your help! Maybe I have some misunderstanding. My email is csxjiang@comp.hkbu.edu.hk.

Hi, I've sent the email.