Open QuangTQV opened 3 months ago
Please help me, thanks
Hi,
The corpus is a jsonl file. Each row is a json object, representing a document, which usually consists of:
The key_template
is used to group the textual field in each document into a single piece of text. For example, the default key_template
({title} {text}
) will have the title and the text of each document concatenated with a space.
In tool learning corpus, there is no title, so the key_template
is {text}
, meaning that only the "text" field is used.
The key
field is a list of texts retrieved for the query from the corpus. The key_index
field is the indices of the retrieved texts. Both fields will be also automatically generated if you specify --metrics collate_key
. See https://github.com/FlagOpen/FlagEmbedding/blob/fec9058948215924b924104d48cbf01e2ab90865/FlagEmbedding/llm_embedder/src/retrieval/metrics.py#L246
The pos_index
field is usually the label. It contains the indices of the positive documents w.r.t. the corpus (i.e. the row indices of all positive documents). On NQ, there are no pre-defined pos_index
. In that case, the evaluation is based on whether the retrieved documents contain the corresponding answer
. See https://github.com/FlagOpen/FlagEmbedding/blob/fec9058948215924b924104d48cbf01e2ab90865/FlagEmbedding/llm_embedder/src/retrieval/metrics.py#L234
Dear author,I find the documents explaining finetune are limited, can you explain some of the following things to me? :![image](https://github.com/FlagOpen/FlagEmbedding/assets/80111554/6e6b8728-cd83-4c61-bac6-3095b7fb1218)
Corpus Data Format: Could you please elaborate on the format of the corpus data? I am having difficulty grasping this concept. Could you provide an example to illustrate how the corpus data should be structured?
Key and Key Index in Evaluation File: Within the evaluation file, what specifically do the terms "key" and "key index" refer to?
Understanding "Answer" in Evaluation File: In the evaluation file, what does the term "answer" represent? Where is this data sourced from?
Evaluation label: According to the picture, the evaluation section does not have neg and pos, there is only an index for neg and pos (optional), so how does the model perform evaluation without labels?
Key_Template for Retrieve Tool in Fine-Tuning: Regarding the finetune process for the retrieve tool, what exactly is meant by "key_template"? I encountered a reference in the documentation mentioning "How to concatenate columns in the corpus to form one key," but I'm struggling to comprehend this aspect.
I would be immensely grateful if you could shed light on these matters or direct me to resources that could offer further clarity. Your expertise in this field would undoubtedly be invaluable to my understanding.
Thank you very much for considering my questions. I eagerly await your response and look forward to enhancing my comprehension of these crucial details.
Warm regards,