Lackel / LOOP

ACL 2024 Findings paper "Generalized Category Discovery with Large Language Models in the Loop"
3 stars 0 forks source link

Effectiveness of LLM Feedback #1

Open HenryPengZou opened 1 week ago

HenryPengZou commented 1 week ago

Hi Authors,

I tried to to run your code and reproduce results without and with LLM feedback. The first table displays the result from the run without providing OpenAIAPI key, i.e., no LLM feedback is provided. The second table displays the result from the run with valid OpenAI API key, i.e., LLM feedback is provided. However, by comparing the result across the 50 epochs, it seems that LLM feedback doesn't help boost much performance in many epochs. Did you observe similar thing for your experiments? Could you help provide some insights? Thanks a lot!

image

HenryPengZou commented 1 week ago

Here is the result from another two runs. It seems the LLM feedback used this paper doesn't help. This result is reproduced without changing any code in the repo. I have ensured that all required packages have the same version.

image
Lackel commented 1 week ago

Hi, thanks for your interest on our work. Does the 'random feedback' in your figure mean randomly selecting the neighbor from two candidate neighbors? If yes, we did not perform these experiments but the results are predictable, since selecting the true neighbor from two candidates is easy (and in our implementation, we always select from the clusters that the most similar neighbor candidate belongs to if the LLM is not available, which can further improve the probability of selecting the true neighbor) and even if the wrong neighbor is selected, it can still provide useful information for contrastive learning because of high similarity between data. In my opinion, LLMs will be more helpful when the number of candidates increases as reported in our paper, or when measuring similarities between data is difficult in some challenging datasets, and random feedback may fail in these situations since it will be hard to randomly select true neighbors. By the way, the performance of our original implementation on banking with seed 0,1,2 and average is listed below. image

HenryPengZou commented 1 week ago

Hi @Lackel,

Thanks for your response!

In Table 2 and Section 5.2.1, it is shown that removing feedback from LLMs will lead to severe performance decline. Could you clarify on this? How did you perform the experiment without LLM feedback? i.e., how did you validate the effectiveness of LLM feedback in your paper?

I thought randomly selecting the neighbor from two candidate neighbors without query LLM is this approach. But it turns out that the performance is roughly the same with and without using LLM query.

image

image

HenryPengZou commented 5 days ago

Hi @Lackel, when have have time, could you help provide some insights for the question above? We plan to include your paper as one of our baselines and your answers will greatly help us. Thanks a lot!

Lackel commented 4 days ago

Sorry for the confusion, w/o LLMs in the paper means removing the entire active learning process and simply performing traditional neighborhood contrastive learning (MTP-CLNN).