RayRuiboChen / Self-Filter

GNU Affero General Public License v3.0
20 stars 0 forks source link

Nice work~ #1

Closed BlueBlueFF closed 6 months ago

BlueBlueFF commented 6 months ago
  1. Do you have tried using llava-665K, whick is more robust than 158K dataset?
  2. When will you release the pretrain score net? Thanks~
RayRuiboChen commented 6 months ago

We do plan to test our method on LLaVA 1.5 recently, which uses the 665K dataset. The code and pre-trained score net will be released within the next two weeks. Thank you.

whitesockcat commented 3 months ago

We do plan to test our method on LLaVA 1.5 recently, which uses the 665K dataset. The code and pre-trained score net will be released within the next two weeks. Thank you.

Have there been any developments or advancements? We have observed that for a dataset of around 665k instances, random selection seems to be a quite effective approach.

RayRuiboChen commented 3 months ago

We do not have a thorough exploration of the 665k dataset so we did not publish the results. However, in my experiments for 665k instances, random selection does not perform well, and our method can produce better results. Maybe you could check your implementation?