Chat GPT as a model to reformulate queries.

yogeswarl commented 1 year ago

This Idea involves us asking ChatGPT to generate relevant queries based off the documents we feed.

We will sample about 10,000 documents that have the following criteria from the refined querys

Highly relevant ( Map score from 0.75 - 1)
Relevant ( 0.5 - 0.74)
Somewhat Relevant (0.25-0.49)
Irrelevant(0 - 0.24)

Based on this data. We will be comparing how well ChatGPT can perform suggesting documents for these queries.

hosseinfani commented 1 year ago

@yogeswarl thanks for creating this issue.

Please note that we need chatgbt to generate the query reformulations. So, the last sentence, "chatgbt can perform suggesting documents for these queries" is not correct, to my understanding. right?

basically, we ask chatgbt in these ways:

1- here is the query q, please give us 10 reformulations/paraphrases of it? 2- here is the query q and its all relevant documents, give us 10 reformulations/paraphrases of the query?

It's like using T5 when it is trained and we asked for predictions.

yogeswarl commented 1 year ago

pointer 2 is correct. This is what we will be doing! Our T5 once trained will be fed with relevant documents and it will generate queries. That is what we will infer from chatGPT as well.

hosseinfani commented 1 year ago

For the comparison, please do these variations:

1- [like a expander] here is the query q, please give us 10 reformulations/paraphrases of it? 2- [like pretrained t5] here are all relevant documents of the query q, give us 10 reformulations/paraphrases of the query? 3- [like fine-tunned t5] here is the query and all relevant documents of the query q, give us 10 reformulations/paraphrases of the query?

Thank you.

yogeswarl commented 1 year ago

Understood.

yogeswarl commented 1 year ago

Screenshot 2023-08-24 at 9 01 59 PM Hello Dr. @hosseinfani I have written function to test out chatGPT's capabilities. I got a paid edition and still I have been having server errors. This issue needs to be handled gracefully. but it will occur every 15 minutes due to the overload in server. Do you have any suggestions as to how to handle this issue.

yogeswarl commented 1 year ago

Update: the above issue has been solved with the use of retrying package.

yogeswarl commented 1 year ago

hello @hosseinfani , Here are the stats and graphical representation of GPT done on msmarco.passage I can run for one more because the predictions take too long to complete. Some stats:

query_category  query_length    mean_map
paraphrase_poor_gpt_query_mean_length   40.421  0.12944940000000002
paraphrase_poor_refined_query_mean_length   43.193  0.45138459999999997
paraphrase_poor_original_query_mean_length  37.004  0.0359629
paraphrase_somewhat_gpt_query_mean_length   40.398  0.4748991
paraphrase_somewhat_refined_query_mean_length   42.762  0.7004480999999999
paraphrase_somewhat_original_query_mean_length  34.964  0.29953789999999997
paraphrase_relevant_gpt_query_mean_length   41.969  0.7646921
paraphrase_relevant_refined_query_mean_length   42.971  0.8539836000000001
paraphrase_relevant_original_query_mean_length  39.078  0.8028525
finetune_poor_gpt_query_mean_length 60.017  0.5468902000000001
finetune_poor_refined_query_mean_length 43.193  0.45138459999999997
finetune_poor_original_query_mean_length    37.004  0.0359629
finetune_somewhat_gpt_query_mean_length 58.303  0.8195252000000001
finetune_somewhat_refined_query_mean_length 42.762  0.7004480999999999
finetune_somewhat_original_query_mean_length    34.964  0.29953789999999997
finetune_relevant_gpt_query_mean_length 56.367  0.8946335000000001
finetune_relevant_refined_query_mean_length 42.971  0.8539836000000001
finetune_relevant_original_query_mean_length    39.078  0.8028525
infer_poor_gpt_query_mean_length    55.03   0.6592458
infer_poor_refined_query_mean_length    43.193  0.45138459999999997
infer_poor_original_query_mean_length   37.004  0.0359629
infer_somewhat_gpt_query_mean_length    53.818  0.8387605000000001
infer_somewhat_refined_query_mean_length    42.762  0.7004480999999999
infer_somewhat_original_query_mean_length   34.964  0.29953789999999997
infer_relevant_gpt_query_mean_length    51.396  0.9012704
infer_relevant_refined_query_mean_length    42.971  0.8539836000000001
infer_relevant_original_query_mean_length   39.078  0.8028525

Some graphical representation: finetune_plot infer_plot paraphrase_plot

yogeswarl commented 1 year ago

I am running another set of poor,somewhat and relevant for user reformulation for aol title url.

hosseinfani commented 1 year ago

@yogeswarl can you please explain what the categories are and put a paragraph of analysis here?

yogeswarl commented 1 year ago

We had 3 thresholds: "poor" where original queries were from 0,0.24, "somewhat" = 0.25,0.49, "relevant" = 0.5,1.0 I went with the 3 categories you asked for. ChatGPT as an inference model -> pass only the documents ChatGPT as a paraphrase model -> pass only the queries ChatGPT as a finetuning model -> pass docs and query in tab separated.

Inference and Finetuned model performed better than T5 and original queries. Two issues arise with chatGPT: one is the time to run the model. it runs through an inference API so it is painstakingly slow. One prediction takes approximately 2 -3s according to the tqdm library. But t5 does one prediction under a second.

The stats are also posted in the above comment with the average mean query length and mean map

yogeswarl commented 1 year ago

There should be only one barplot for this. I am going to make it more optimized

hosseinfani commented 1 year ago

@yogeswarl thank you. Can you find the research paper for the chatgpt? we need to see why it's better than t5. Is it due to architecture or training dataset or ...

yogeswarl commented 1 year ago

@hosseinfani https://arxiv.org/pdf/2005.14165.pdf here is the paper. I will give a quick summary about this by tomorrow evening.

yogeswarl commented 1 year ago

From the paper I was able to delve these as why ChatGPT is better and few more things I did that were not considered in chatGPT.

Unlike T5 for which we set a max input length of 512. I fed the whole document collection to chatGPT.
Unlike T5, chatGPT is a conversational large language model for which the parameter size is 175B!! compared to t5 base which is 220 Million alone.
ChatGPT is specifically designed for generating human-like text in a conversational context. It's fine-tuned on dialogue data to provide contextually relevant response. T5, on the other hand, is a more general-purpose model that approaches various NLP tasks as text-to-text tasks.

One option I can think of here is to stop the number of words chatGPT can see, (i.e 512 words) and compare them both with respect to T5.

One problem with chatGPT is that it cannot limit the number of characters to the point like T5. the maximum output length is always much greater than both the mean of t5 and original query @hosseinfani . Should I redo this for fine-tune and inference

yogeswarl commented 1 year ago

plot plot Made plots much smaller and cleaner

fani-lab / RePair

Chat GPT as a model to reformulate queries. #37