Closed stydxm closed 5 months ago
If contexts longer than context window is not the expected usage, maybe the error could be raised by this package before sending the data to llms.
I think this makes the most sense, as if we split the context it might have some unintended effect on the scores. I think in this situation a warning should be raised and score for the particular row must be NaN.
@stydxm What do you think? Let me know if you like to work on this
Hi @shahules786 ,
I think this does is a problem, but actually I have no idea how to deal with it properly. Maybe this should be further discussed.
For too long contexts, splitting it automatically may cause effects on the scores. But on the other hand, refuse to evaluate on these long contexts may restrict the usage of this package. So we need more people's opinions to make a choices.
Or there is another possibility that contexts
does not expect long texts and I did not truly understand it due to my bad English level. As for my dataset, I put the whole passage into contexts
, questions into question
and their answers into ground_truths
, and responses from my rag program into answer
. I manually splitted the passages in my rag program so it won't raise such errors. If my understanding is wrong, I will appreciate it if you could tell me it's a correct usage.
@stydxm IMO, This error pops mainly in users who use gpt-3.5-turbo which has 4k context length. 99% of the users get's this solved by using the 16k
version. Can you try that out?
@stydxm IMO, This error pops mainly in users who use gpt-3.5-turbo which has 4k context length. 99% of the users get's this solved by using the
16k
version. Can you try that out?
I tested it and find most datas could be evaluate but a very small number of datas still couldn't be put in the context window.
I use Google's Natural Questions as my dataset, in which each single data is a Wikipedia page, so it's normal that some context is very long.
With this results I think raise an error or make the score NaN and raise warning at the same time it is the better choice.
Anyway, in current version, this error interrupts the evaluation process. I don'tt think it is appropriate.
I also get context length errors occasionally due to some outlier documents that are simply too long. It'd be nice if there is an option to control what to do when context is too long before the prompt is sent to the LLM.
@shahules786 is the behaviour you are describing above implemented by setting raiseexceptions=False in evaluate()?
Describe the bug When evaluate with long context, an error was raised like this:
Ragas version: 0.1.0 Python version: 3.10
Code to Reproduce
Error trace It won't help, I think
Expected behavior I think the
evaluate
function should split contexts longer than the llm could process. For custom models, also a param for specifying its context window. If contexts longer than context window is not the expected usage, maybe the error could be raised by this package before sending the data to llms.