Error: xception raised in Job[0]: APIConnectionError(Connection error.) when split one big batch into several small batches to submit

francescofan commented 1 month ago

[ ] I have checked the documentation and related resources and couldn't resolve my bug.

Describe the bug I set up a defination to split the input Dataset into several batches to submit. For example, when a big batch which includes 200 questions/answers/ground_truth/contexts content. I will figure it out and split it into 20 batches which contains 10 questions per batch. And when the result of first batch is returned and next batch is submitted there will be an error like the following. xception raised in Job[0]: APIConnectionError(Connection error.) Exception raised in Job[3]: TimeoutError() Exception raised in Job[2]: TimeoutError() Exception raised in Job[1]: TimeoutError() I don't know why this camed. I suppost it is just like a loop submit.

Ragas version: 0.1.20 Python version: 3.11.9 LLM: Deepseek v2.0 Embedding: customized embedding : embeddings = MyCustomEmbeddings("http://customizedEmbbding something like this")

Code to Reproduce

dataset = {......}
result = {}
result['response'] = []
batch_size = 10
for i in range(0,len(dataset.get('question')),batch_size):
        dataset_batch = {
            'question': dataset.get('question')[i:i+batch_size],
            'answer': dataset.get('answer')[i:i+batch_size],
            'contexts': dataset.get('contexts')[i:i+batch_size],
            'ground_truth': dataset.get('ground_truth')[i:i+batch_size]
        }
        dataset_batch = Dataset.from_dict(dataset_batch)
        try:
            score = evaluate(dataset_batch,metrics=metrics,llm=deepseek_llm,embeddings=embeddings)
            # score = evaluate(dataset_batch,metrics=metrics)
            score_df = score.to_pandas()
            score_dict = score_df.to_dict('records')
        except Exception as e:
            print(f'Error: {e}')

Error trace Evaluating: 0%| | 0/4 [00:00<?, ?it/s]Exception raised in Job[0]: APIConnectionError(Connection error.) Exception raised in Job[3]: TimeoutError() Exception raised in Job[2]: TimeoutError() Exception raised in Job[1]: TimeoutError() Evaluating: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [03:00<00:00, 45.00s/it]

All metrics is nan: 'faithfulness': nan, 'context_recall': nan, 'context_precision': nan, 'answer_correctness': nan

Expected behavior It will not occur APIConnectionError(Connection error.) and finish succesfully.

Additional context Add any other context about the problem here.

dosubot[bot] commented 1 month ago

Hey there, @francescofan! I'm here to help you with any bugs, questions, or contributions you have. Let's squash that bug together!

The APIConnectionError and TimeoutError you're encountering could be due to several factors related to network stability, server response times, or configuration settings in the Ragas framework. Here are some steps you can take to mitigate these issues:

Increase Timeout Settings: The default timeout for API calls in Ragas is 180 seconds. You can try increasing this timeout to see if it helps with the TimeoutError. You can adjust the timeout by modifying the RunConfig class:
```
from ragas.run_config import RunConfig

my_run_config = RunConfig(timeout=300)  # Increase timeout to 300 seconds
```
Configure Retry Mechanism: Ragas uses the tenacity library to handle retries for failed API calls. You can configure the retry mechanism to increase the number of retries or adjust the wait time between retries:
```
my_run_config = RunConfig(max_retries=15, max_wait=120)  # Increase retries and wait time
```
Check Network Stability: Ensure that your network connection is stable and has sufficient bandwidth to handle multiple API requests. Unstable connections can lead to APIConnectionError.
Batch Size: Although you are already splitting the dataset into smaller batches, you might want to experiment with different batch sizes to see if it affects the error occurrence. Sometimes, smaller batches can help reduce the load on the server.
Logging and Monitoring: Enable detailed logging to capture more information about the errors. This can help in diagnosing the root cause of the connection issues.
Server Load: If the server you are connecting to is under heavy load, it might cause connection errors. Check if there are any known issues with the server or if there are any rate limits being hit.

By adjusting these configurations and checking the network conditions, you should be able to reduce the occurrence of these errors.

To continue talking to Dosu, mention @dosu.

francescofan commented 1 month ago

@dosu I have tried 1/2/3/4/6 method. But the problem is still there. And how to get more Logging.

dosubot[bot] commented 1 month ago

To enable more detailed logging in your Ragas setup, you can set the environment variable __RAGAS_DEBUG_TRACKING to True. This will enable debug logging for tracking events, which includes logging the payload and any errors encountered during tracking. You can do this by running the following command in your terminal:

export __RAGAS_DEBUG_TRACKING=True

This should provide more detailed logs that can help diagnose the issues you're facing with APIConnectionError and TimeoutError ^[1].

To continue talking to Dosu, mention @dosu.

francescofan commented 1 month ago

@dosu I am in a windows environment. So how to set up export __RAGAS_DEBUG_TRACKING=True

dosubot[bot] commented 1 month ago

In a Windows environment, you can set the environment variable using the set command in the Command Prompt. Here's how you can do it:

set __RAGAS_DEBUG_TRACKING=True

Run this command in the Command Prompt before executing your Python script. This will enable detailed logging for your current session. If you need this setting to persist across sessions, you can add it to your system environment variables through the System Properties.

To continue talking to Dosu, mention @dosu.

francescofan commented 1 month ago

@dosu Have you ever met the problem that you submit the exact 2 same code and the first is ok but the second is always encoutered APIConnectionError(Connection error.)?? just like score = evaluate(dataset_batch,metrics=metrics,llm=deepseek_llm,embeddings=embeddings)

dosubot[bot] commented 1 month ago

I haven't encountered that specific issue directly, but it sounds like it could be related to how the Ragas framework handles API connections or server-side rate limiting. Since you've already tried various methods to address the issue, consider checking if there are any server-side restrictions or rate limits that might be causing the second request to fail. Additionally, ensure that your network connection is stable and that there are no intermittent issues affecting the connection. If the problem persists, enabling detailed logging as previously discussed might help identify any underlying issues.

To continue talking to Dosu, mention @dosu.

francescofan commented 1 month ago

@dosu Hello. I have tried all methods you provide but in vain. The problem was still there. But after I finished fixing another problem ' https://github.com/explodinggradients/ragas/issues/1432', this problem is fixed,too. Actually I don't know why. Would you please tell me?

dosubot[bot] commented 1 month ago

I wasn't able to find specific information about the changes made to resolve GitHub issue #1432 in the Ragas repository. However, it's possible that the fix for that issue inadvertently addressed underlying problems related to API connections or configuration settings that were causing the APIConnectionError. Sometimes, changes in one part of a system can have unexpected positive effects on other parts, especially if they share dependencies or configurations. If the issue was related to embeddings configuration, it might have impacted how the framework handled API requests or managed resources, leading to the resolution of your connection errors.

To continue talking to Dosu, mention @dosu.

explodinggradients / ragas

Error: xception raised in Job[0]: APIConnectionError(Connection error.) when split one big batch into several small batches to submit #1423