WEIYanbin1999 / KICGPT

[EMNLP 2023] KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph Completion
MIT License
33 stars 3 forks source link

why the train dataset include the valid dataset? #4

Closed zhiweihu1103 closed 5 months ago

zhiweihu1103 commented 6 months ago

Great work, some questions about the paper:

  1. Why is the validation set added to the training set?
  2. Can you provide in the code your processing content for calculating MRR and Hits indicators from ChatGPT return results? This is critical.
  3. I saw that your openreview mentioned the results of LLAMA. Can you integrate this part as well? Thank you https://github.com/WEIYanbin1999/KICGPT/blob/63d8163be1c7608ddaf594103943d5137eef5171/get_demonstrations.py#L42
WEIYanbin1999 commented 6 months ago

Dear zhiweihu1103,

We would be happy to answer your questions below:

  1. KICGPT is a training-free framework. In real-world scenarios, we can assume that the training and validation sets are provided, with the test set remaining unseen, correct? Typically, models utilize the validation set to tune their hyperparameters. However, for KICGPT, we only require 200 instances for validation and do not need to train GPT. Therefore, as outlined in the paper, to enhance the diversity of knowledge within the pools, we merge the training and validation sets to generate demonstrations. If further clarification is needed, one could think of K-fold cross-validation, where the entire validation set is also utilized to train a model. Similarly, KICGPT also uses train+valid in hand, but the difference is that KICGPT needs no training.

  2. That makes sense. However, I'm incredibly busy for the next month. I'll make time to complete them when available.

  3. In our published/camera-ready paper, we deliberately omitted the llama results, intending to include them in a separate work. Due to subsequent iterations and extensions for new experiments, the code associated with this portion has undergone multiple revisions, forming the foundation for an ongoing project. Consequently, we do not currently intend to release the code, as it is integral to the development of this new work.

    Thanks for your attention, have a good day.

zhiweihu1103 commented 6 months ago

Further question, regarding the calculation of evaluation indicators, the content output by GPT is difficult to unify to a large extent. How do you extract the appropriate content to calculate the indicators, because I follow your ICL to let GPT reorder RotatE with the top-k results, I found the following two problems with the output:

  1. Assuming that the selection of k is 10, the output of GPT may be less than 10;
  2. The output format cannot be parsed according to the rules;

How did you solve the above problem.

zhiweihu1103 commented 6 months ago

Dear zhiweihu1103,

We would be happy to answer your questions below:

  1. KICGPT is a training-free framework. In real-world scenarios, we can assume that the training and validation sets are provided, with the test set remaining unseen, correct? Typically, models utilize the validation set to tune their hyperparameters. However, for KICGPT, we only require 200 instances for validation and do not need to train GPT. Therefore, as outlined in the paper, to enhance the diversity of knowledge within the pools, we merge the training and validation sets to generate demonstrations. If further clarification is needed, one could think of K-fold cross-validation, where the entire validation set is also utilized to train a model. Similarly, KICGPT also uses train+valid in hand, but the difference is: KICGPT needs no training.

  2. That makes sense. However, I'm incredibly busy for the next month. I'll make time to complete them when available.

  3. In our published/camera-ready paper, we deliberately omitted the llama results, intending to include them in a separate work. Due to subsequent iterations and extensions for new experiments, the code associated with this portion has undergone multiple revisions, forming the foundation for an ongoing project. Consequently, we do not currently intend to release the code, as it is integral to the development of this new work.

Thanks for your attention, have a good day.

Have you evaluated not using valid? Because obviously, valid will make your two pools more information-rich.

WEIYanbin1999 commented 6 months ago

Further question, regarding the calculation of evaluation indicators, the content output by GPT is difficult to unify to a large extent. How do you extract the appropriate content to calculate the indicators, because I follow your ICL to let GPT reorder RotatE with the top-k results, I found the following two problems with the output:

  1. Assuming that the selection of k is 10, the output of GPT may be less than 10;
  2. The output format cannot be parsed according to the rules;

How did you solve the above problem.

Unexpected cases include:

Outputting entities beyond the candidate set: In this case, we directly filter out the entities that are not in the candidate set.

Losing some candidate entities: In this case, we append the lost entities according to their corresponding initial ordering (by retriever) at the end of LLM output. Note that this case typically occurs when the LLM is very confident about the top answers and it tends to omit the others.

Damaged output: In this case, we cannot identify a valid ordering from the LLM output. To handle this, KICGPT directly uses the initial ordering of the retriever as the final decision (i.e., it degenerates to the retriever).

zhiweihu1103 commented 6 months ago

Further question, regarding the calculation of evaluation indicators, the content output by GPT is difficult to unify to a large extent. How do you extract the appropriate content to calculate the indicators, because I follow your ICL to let GPT reorder RotatE with the top-k results, I found the following two problems with the output:

  1. Assuming that the selection of k is 10, the output of GPT may be less than 10;
  2. The output format cannot be parsed according to the rules;

How did you solve the above problem.

Unexpected cases are:

Outputting entities beyond the candidate set: In this case, we directly filter out the entities that are not in the candidate set.

Losing some candidate entities: In this case, we append the lost entities according to their corresponding initial ordering (by retriever) at the end of LLM output. Note that this case typically occurs when the LLM is very confident about the top answers and it tends to omit the others.

Damaged output: In this case, we cannot identify a valid ordering from the LLM output. To handle this, KICGPT directly uses the initial ordering of the retriever as the final decision (i.e., it degenerates to the retriever).

Sounds like good advice. Thx.

WEIYanbin1999 commented 6 months ago

Dear zhiweihu1103, We would be happy to answer your questions below:

  1. KICGPT is a training-free framework. In real-world scenarios, we can assume that the training and validation sets are provided, with the test set remaining unseen, correct? Typically, models utilize the validation set to tune their hyperparameters. However, for KICGPT, we only require 200 instances for validation and do not need to train GPT. Therefore, as outlined in the paper, to enhance the diversity of knowledge within the pools, we merge the training and validation sets to generate demonstrations. If further clarification is needed, one could think of K-fold cross-validation, where the entire validation set is also utilized to train a model. Similarly, KICGPT also uses train+valid in hand, but the difference is: KICGPT needs no training.

  2. That makes sense. However, I'm incredibly busy for the next month. I'll make time to complete them when available.

  3. In our published/camera-ready paper, we deliberately omitted the llama results, intending to include them in a separate work. Due to subsequent iterations and extensions for new experiments, the code associated with this portion has undergone multiple revisions, forming the foundation for an ongoing project. Consequently, we do not currently intend to release the code, as it is integral to the development of this new work.

Thanks for your attention, have a good day.

Have you evaluated not using valid? Because obviously, valid will make your two pools more information-rich.

We did not perform any experiment about this setting. As you may know, one run of the whole dataset is costly. So we did not perform extra experiments except we included in our paper.

zhiweihu1103 commented 6 months ago

Dear zhiweihu1103, We would be happy to answer your questions below:

  1. KICGPT is a training-free framework. In real-world scenarios, we can assume that the training and validation sets are provided, with the test set remaining unseen, correct? Typically, models utilize the validation set to tune their hyperparameters. However, for KICGPT, we only require 200 instances for validation and do not need to train GPT. Therefore, as outlined in the paper, to enhance the diversity of knowledge within the pools, we merge the training and validation sets to generate demonstrations. If further clarification is needed, one could think of K-fold cross-validation, where the entire validation set is also utilized to train a model. Similarly, KICGPT also uses train+valid in hand, but the difference is: KICGPT needs no training.

  2. That makes sense. However, I'm incredibly busy for the next month. I'll make time to complete them when available.

  3. In our published/camera-ready paper, we deliberately omitted the llama results, intending to include them in a separate work. Due to subsequent iterations and extensions for new experiments, the code associated with this portion has undergone multiple revisions, forming the foundation for an ongoing project. Consequently, we do not currently intend to release the code, as it is integral to the development of this new work.

Thanks for your attention, have a good day.

Have you evaluated not using valid? Because obviously, valid will make your two pools more information-rich.

We did not perform any experiment about this setting. As you may know, one run of the whole dataset is costly. So we did not perform extra experiments except we included in our paper.

Thanks for your quick reply, and looking forward you new paper regarded open-source LLMs for KGC.