Open bsowell opened 1 year ago
Still happening: https://github.com/aryn-ai/sycamore/actions/runs/6293759018/job/17084907000. It is infrequent enough, I suspect it may be dependent on the result result returned by OpenAI.
We traced this to throttling from OpenAI. Looks like we are already doing basic retries, but we should see if there is anything better we can do and at the very least surface the error more clearly
Hello @bsowell,
I've been investigating the non-deterministic failures you've mentioned. I noticed that test_pdf_to_opensearch
does not pass llm_kwargs
here, and OpenAI is ultimately set here. I believe you're referring to the retry mechanism here?
So, my queries are:
_generate_using_guidance
?To enhance our retry mechanism, I'm considering using This guide
I would appreciate your thoughts on this. Please let me know if I am heading in the right direction.
Also, i think this is related to https://github.com/aryn-ai/sycamore/issues/36
Hey @ChillOrb! Thanks so much for taking a look.
So guidance, which is another library for interacting with LLMs, does have some retries enabled by default (e.g. see the default parameter here). Even with that, we still sometimes see timeouts. I do think eventually we want to move off of guidance and just use openai directly -- we aren't getting too much value, and I think we will want to customize the behavior, for example by using some of the techniques you linked. One place to start might be to see if we are doing the right thing in terms of retries for the non-guidance based access.
I do think this is slightly different than #36. That issue is about OpenSearch. When writing to OpenSearch we can sometimes overwhelm the cluster and need to add mechanisms to backoff. Here we are hitting issues access OpenAI. Kind of unfortunate in this project that everything starts with Open :).
Hey @bsowell ,
Thanks for clarifying,
Code here looks fine to me. Still a few things i think could be addressed and would love to get your opinion,
Let me know if this is something that needs to be worked on right now.
Describe the bug I have seen some non-deterministic failures in the
test_pdf_to_opensearch
test. The issue appears to be related to how guidance is returning results. In some cases, the returned map does not contain an answer key:From https://github.com/aryn-ai/sycamore/actions/runs/6176295926/job/16765129778#step:10:312
This looks to be non-deterministic, because the very next run of the exact same workflow succeeded:
The code is failing from here: https://github.com/aryn-ai/sycamore/blob/a076d86644375a60f0c527b83e96d5d3e12345a5/sycamore/execution/transforms/entity/entity_extractor.py#L41 We will likely need to add some additional logging to get to the bottom of this.