glibsonoran / Plush-for-ComfyUI

Custom node for ComfyUI/Stable Diffustion
GNU General Public License v3.0
154 stars 15 forks source link

Feature Request: Allow retries for failed LLM queries #135

Open Jonseed opened 1 month ago

Jonseed commented 1 month ago

When using OpenRouter, sometimes queries will fail with this error shown in troubleshooting:

ERROR: Server was unable to process the response. Error: {'message': '{\n "error": {\n "code": 429,\n "message": "Resource has been exhausted (e.g. check quota).",\n "status": "RESOURCE_EXHAUSTED"\n }\n}\n', 'code': 429}

I think this happens because the particular model/provider is overwhelmed at the moment. For example, Google's Gemini Pro 1.5 Experimental (0827) often responds like this on OpenRouter. I think they just have too many requests going through OpenRouter to Google Vertex or AI Studio at the time to handle them all. When the request fails with an error like this, the output from the node is blank, empty. If I queue the workflow again, the request will often go through just fine.

I wonder if the Advanced Prompt Enhancer node could have a retry function, where if it receives an "error" response like this it could retry the request a specified number of times before giving up. Currently, when it gets an error and outputs nothing, it messes up the rest of my workflow. If it could just retry the request a few times, then it would often go through just fine and get a proper response from the service. Perhaps the user could even specify what type of error, such as code 429 or any other string of text in the error message, it should attempt to retry.

glibsonoran commented 1 month ago

This is a more difficult issue to address. I'll have to look at what's involved.

glibsonoran commented 1 week ago

@Jonseed I've got a refactor of my api calls that should retry the Completions() call if a "retryable" error occurs. I made an account on OpenRouter, but I've not been able to get an error to test my retry code. Is there a model (it's been a month so I'm assuming some model other than Gemini Pro 1.5 is the hot ticket now) I should use that gets tied up a lot, or a time of day that's better for getting overload errors?

Jonseed commented 1 week ago

I think any of Google's Gemini "Experimental" models are "heavily rate-limited" and often error out with code 429. The latest and greatest is version 1114.

glibsonoran commented 1 week ago

OK so: google/gemini-exp-1114?

Jonseed commented 1 week ago

yes

glibsonoran commented 3 days ago

@Jonseed @enragedAntelope @alessandroperilli I've updated the Advanced Prompt Enhancer (and also the Dall_e and Style Prompt nodes) to allow users to select the number of Tries (retries if more than 1) the node will make to connect and generate. Please let me know if this works and/or you have any issues with it.

Also I changed some of the item names in the Advanced Prompt Enhancer AI_services drop down menu. Originally these items were meant to apply to connecting to local applications like LM Studio. Since people are increasingly using them for remote applications I thought the old names were misleading.

If you have published workflows using this node, they will generate an error if the menu item is set to a name that no longer exists in the list (the old name). If so you might want to refresh your workflow using the updated node and one of the new name selections.

EnragedAntelope commented 3 days ago

You're awesome with the great maintenance and constant enhancements! Thanks. I've updated and set the retries so I'm looking forward to fewer gens with "server could not process the request" as my prompt :)

alessandroperilli commented 3 days ago

Good timing, @glibsonoran, and thanks for tagging me on this. I'll update the upcoming AP Workflow 12.0 to align with the changes. Great work!

glibsonoran commented 3 days ago

Thanks for the encouraging words and thumbs upses.

One more thing: Since it's apparent that more and more users are using APE to connect to remote and paid remote services like OpenRouter, and since this setup is a little on the complicated side, I've produced a short primer on how to set APE up for these connections in my ReadMe.

Just in case you want to reference it or use it in any way for your workflow users.