blib-la / runpod-worker-comfy

ComfyUI as a serverless API on RunPod
GNU Affero General Public License v3.0
223 stars 136 forks source link

[BUG]: When testing locally and looping runsync, it eventually stalls #40

Open vesper8 opened 2 months ago

vesper8 commented 2 months ago

Describe the bug

I'm testing the API locally before deploying to Runpod. I'm testing on a 4070 Super. When I make a single call to /runsync it will complete without fail every time.. and do a really nice job of it. But if I loop let's say 10 requests, it will always eventually stall. There is no more output in the terminal, and the fans keep on spinning.. it just gets stuck.. one would guess there might be some kind of memory leak. Or maybe it tries to load the same models again and again and the memory runs out. It's rather hard to debug I guess. This doesn't seem to happen if I'm generating small images but when they are larger images that take longer to generate, it happens without fail.

I should add that I'm using the same checkpoint, and doing the same operations in my loop, so it's not like I'm requesting it to load a different model repeatedly.

Is there a way to force a memory clean in between generations.. or maybe run with a higher level of verbosity?

vesper8 commented 2 months ago

It seems to be related to the /upload/image somehow.

I'm passing the same image, converted to base64, for each iteration of my loop. This isn't a particular big image, only about 300kb.

It works fine the first couple of times but then I start seeing this error over and over again:

comfyui-worker | DEBUG  | test-09828fb9-5b04-46cb-bae2-205ca2680559 | run_job return: {'error': '{"error_type": "<class \'requests.exceptions.ConnectionError\'>", "error_message": "HTTPConnectionPool(host=\'127.0.0.1\', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection object at 0x7fd5cead31f0>: Failed to establish a new connection: [Errno 111] Connection refused\'))", "error_traceback": "Traceback (most recent call last):\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\\", line 174, in _new_conn\\n    conn = connection.create_connection(\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\\", line 95, in create_connection\\n    raise err\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\\", line 85, in create_connection\\n    sock.connect(sa)\\nConnectionRefusedError: [Errno 111] Connection refused\\n\\nDuring handling of the above exception, another exception occurred:\\n\\nTraceback (most recent call last):\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\\", line 715, in urlopen\\n    httplib_response = self._make_request(\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\\", line 416, in _make_request\\n    conn.request(method, url, **httplib_request_kw)\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\\", line 244, in request\\n    super(HTTPConnection, self).request(method, url, body=body, headers=headers)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1283, in request\\n    self._send_request(method, url, body, headers, encode_chunked)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1329, in _send_request\\n    self.endheaders(body, encode_chunked=encode_chunked)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1278, in endheaders\\n    self._send_output(message_body, encode_chunked=encode_chunked)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1038, in _send_output\\n    self.send(msg)\\n  File \\"/usr/lib/

No idea why it works at first and then doesn't..

TimPietrusky commented 2 months ago

@vesper8 thanks for reporting this.

Do you maybe have a repo / script that we can use to simulate exactly what you are doing? It would help us a lot to just get into testing.

On a first glance, it sounds like a problem in ComfyUI itself, but to be sure, we will also test this.

vesper8 commented 2 months ago

I don't have a repo that I can share.. but let me explain in greater detail what I'm doing and maybe that will help.

I have a Windows machine on my private home network that has a powerful GPU. I set up runpod-worker-comfy there following the setup instructions and forwarded the ports so that I can access the UI, and the API, from any other machine on my home network.

Then, from my Macbook I have a very basic Laravel command that sends the workflow and base64 image to the API running on my Windows machine.

This works great for a few images, but if I repeatedly send more images it always ends up stalling with the error message above.

It's as if the API is not in a ready-state at some point and craps out. This isn't a problem when generating images that don't have an input image.

I think overall the input image logic introduced in 2.0 could maybe be improved so that we could pass an absolute url, such as an S3 url, or maybe we can pass an image file directly instead of having to b64 encode it. Or maybe if there was a way to say "use this one image for all of these generations". I'm not sure.. just throwing out ideas. Maybe the first step is to understand why exactly the image upload works initially and then stops working when the load is too heavy.

TimPietrusky commented 2 months ago

@vesper8 thanks for the detailed explanation.

Do you wait inbetween requests until the former request was handled? Or do you send multiple requests at once?

vesper8 commented 2 months ago

I use the /runsync endpoint and I don't send another http request to the api until the first one has completed and I've gotten the image back from it. I even added a 1 second wait in between requests.

TimPietrusky commented 2 months ago

@vesper8 ok thank you, this is enough information to actually start debugging.

vesper8 commented 2 months ago

thank you! I hope you can at least reproduce it easily. I've been working with it today and it continues to happen a lot.. I'm never able to do more than 3 images at a time. And when it stalls.. the UI at http://192.168.2.179:8188/ becomes unreachable and it seems the only thing to do is CTRL-C the docker instance and bring it back up.

I tried enabling REFRESH_WORKER in the docker-compose.yml but that doesn't seem to have any effect.. is that only for running on Runpod and doesn't affect locally?

It would be nice to have a similar flag for local testing. A way to start with a clean slate before processing the next image.

Right now it's so unclear whether it's my setup running out of memory or what.. the log doesn't say much.. isn't there a way to enabling more logging?

here are some more logs that just happened:


comfyui-worker | INFO   | test-978b2b1c-ce09-4849-8b7d-3fd318f016c2 | Started.
comfyui-worker | runpod-worker-comfy - API is reachable
comfyui-worker | runpod-worker-comfy - image(s) upload
comfyui-worker | runpod-worker-comfy - image(s) upload complete
comfyui-worker | got prompt
comfyui-worker | runpod-worker-comfy - queued workflow with ID da9468ab-e4bc-4513-9cb1-6cb2ad9b398c
comfyui-worker | runpod-worker-comfy - wait until image generation is complete
comfyui-worker | Requested to load SDXLClipModel
comfyui-worker | Loading 1 new model
comfyui-worker | /usr/local/lib/python3.10/dist-packages/insightface/utils/transform.py:68: FutureWarning: `rcond` parameter will change to the default of machine precision times ``max(M, N)`` where M and N are the input matrix dimensions.
comfyui-worker | To use the future default and silence this warning we advise to pass `rcond=None`, to keep using the old, explicitly pass `rcond=-1`.
comfyui-worker |   P = np.linalg.lstsq(X_homo, Y)[0].T # Affine matrix. 3 x 4
comfyui-worker | /usr/local/lib/python3.10/dist-packages/torchvision/transforms/v2/_deprecated.py:41: UserWarning: The transform `ToTensor()` is deprecated and will be removed in a future release. Instead, please use `transforms.Compose([transforms.ToImageTensor(), transforms.ConvertImageDtype()])`.
comfyui-worker |   warnings.warn(
comfyui-worker | Requested to load SDXL
comfyui-worker | Loading 1 new model
100%|██████████| 20/20 [00:07<00:00,  2.72it/s]
comfyui-worker | Prompt executed in 16.02 seconds
comfyui-worker | runpod-worker-comfy - image generation is done
comfyui-worker | runpod-worker-comfy - /comfyui/output/ComfyUI/test_00053_.png
comfyui-worker | runpod-worker-comfy - the image was generated and converted to base64
comfyui-worker | DEBUG  | test-978b2b1c-ce09-4849-8b7d-3fd318f016c2 | Handler output: {'status': 'success', 'message': 'iVBORw0KGgoAAAANSUhEUgAAAvgAAAL4CAIAAACBQBS0AAEAAElEQVR4Xoz97XLkOrIlCq7lDpIRkjJz711V3T028zjzIPPiM2Yz93afU1/7IzOlCBJwX/eHgyFlnWqzQaYUJOBw+DccABni//v/9f/8RKaSUJIGImUAAEkASUICIBCEwLomCwgCANTdvJnXBAEBBPWxoUrheYAI4AQSgLMvAHFC/+9LkYqJZ16/U1bk/0vhHFmPawKSAaRIQCJASBAIzvbqSkAkU5z0AYCsBitmHkOKLJZIKQlKUxw8wXT+TExZPw/Ec1icbJbUH2j4qJkdoBqytFX1AiiBOhVDMIspFs7Srqo7AUgqrQMljTzRA5RkczwWZAnq5EMo0QGApgEUJqJopjIBFB0AjUpBJHIyNsmDBGUSSTIj7Rxxyj6kBAREUkpl1tgjlchUjlSmhD4iIyNjCEfkGDmUh7SnurJLR+oQjswgDqALQ5mc9K6Lr0sDFFKX7j26cQf2zA4M4B4aUAcG0VNDCCKIkUggUmkckoTIFJmCABojlERKghIsOZ/qxRQuEZpWQEKSkwYaaaQBRhIwUkKrG8mMkBa3ZWmLt21ZFvOleXNz0pyQooc1pgQgM0SMVIQiIqUgMtSPXtcSRg+RWUqQoKKcDyPU/DnVX5yQAFFmwgeHJECSRDFS3EHwYiHTjUiZ0QADmhOCAQYZSZKSEQYagJR7CYSUzMwIigY5QdJhRjnVSE+Y1Mhm5mAzNGVLrMAKLMwVWMwWw+axGS5Md2xNW8NiWhrcRmMakpQhHQkBhBMzigJGIAAgSEkEJKQIICSCScAIMslw66sf1/bdl9fL5dv68v3pz39sP31rL9/t86/5/Lc7//Z9//Xb/sevR3+l99ze/viJ40kjxiHrr9/+oNk9xhF3WCSQJigryCtVqklSLDckyg7BqaWpLUIEyYqRRgAESataY1VWOIG7Tc0+tD3bSLOqF0QaAAjkx+HmLSpGFCUTDvUJ1ugAxIKvHyOBGqHwNIKCmRmNRpM500kTnEbCjEa46KQDC5sDq3EBF3IBNuNGXMmr4Wq2URt1Ma1MRzpzMZkFDQYQMgKAQyTMUFomRINSVkwYytxNMy5O1knV9fyYTRRFETyBP8gEoGYPgCV0kvqAARLJChUosCqF5ryrHgLeNVHy/YjqpG5efSwq7VfjWQdMrKWtE/B/X/4LihqfgIqYB0oJj0QBqgkCEKgHClrIK4oGPczvm7UXiSk3F+QSdMpPMvKc+DAHrSkRmJ4xr3/k4EFvDT5rTpjTbucdcbYRJ3LM24J4r5rlga1YRF0+bvjet4rq50ciAUxaWH2JkhpJmVhWUmDExC2JxZMA1Nw91UcCsql0AO/iedAKqeyW/EBNzQHnLT+o+4P1V3m34oefPIgB+chgCnL2L4AHfr7fEcAZHN5ra7RSXOUZQNkagYrdgihMT6tOxQNF6cxp3oX1YTygcEuoIOqQZrIDnRn2SWelm4BEiURFZKch9c59qkgRQCMEpFFQBABIczAwlQL0uM4UlUBKMCiUUBKAaPxBgUUrITIkNwJISc6ZxJIgJNGnBPUuMwKiQYL81BcFQ0oyFkklWQJ5Ok0KsBJt0fSQYFnmKX0DqDOgCKAoQDXpl2TMaaQb3WlOOqyxEqBZTClFpqBQZmhIEZnSSI3MEVmpYiYyM4EUlKWclCRhMlKMC2XY0xaAEtNDIhCmnYCcUx9UfYpqo5Rz5VBBCQBB46jVAiAiKQPMICIkA6whZw5UASVZWRFhhIOQqHTASC/TAjxoAJUNauAiuHIhVnIBF9PWcDFeiIW5mC6NT43bgs25mq1kc4FGyioDRNZKpQzEDACdSFApEiYIcrFkVL+MBkV2ueGyEeMQ3pK/h8TLCIvPC/np+Xp9+unzpz9+bm83P+49337u99vv+w3jvrD7p+e433ncWjSijx7SkJIKg5JZgjYiSYCay4qy3BIzUHPktCtWrlOXoh7+W+AEaN
comfyui-worker | ...TRUNCATED 923404 CHARACTERS...
comfyui-worker | 3vVJgp2/K0XY3pMadMVc+eqMbA7r0Zpqq72yoxqXhhCyS7u7tG7utpICRps9tesYA8xWoxl8nEWtxOWiOva0tWYnxbWxj2ThzYwhgBpZKCaVnrsidJLijlIkiMrlWlkk1VjSQ8XvOH8RgSMVwHTWVOtpT5vhaNIM8e4+Qxujar5myO/Ke7pTG7jZG6e29Nt/K+V7Bx05arKCWaGbHusBnWptzYjpqShlw9ZdRtWarSGFahkeetahsJEapCIntFUo0hRDhU29QQKjVztjw1tti6ZDQo3NFIvJYVRs2rjhw5cxgohLBMWABCUglbldxTgFTAkiKRsSAdDIoEJbFSflGShI2U4VcpSbKtOA1AzCLw3WC7cbB3+MK5gGVpDvWpJjWnXex7zVNN17RuO13iNOY+eow5Rm8buEz1Xm5UjA1p1HoP4L637XbXkLr3fbetE7g1ynNqs0Qh3MqUUmFLlJDAUexKDA+lgQuBpx0JSdiOBwA5laM3bNcYxtO4XaWeXaJNVWGzlSTN7u7axlZVnhs+v/tw+fDN+4/faLu0xu5xnTrVSVZpV7cLz5qmKFTFtk9Rpzm3a2/t86jLw/ZOp0trwDifRpKUURpVzJiUS8uS2hYawmZUtZ2501KE0TZyjWWIlRmzanYPqQazjXsgeRaMnid067lPX7Clfc7ZtHDp1r0nFqF97tOY+GMfMzZek7csjmoh7oewpBybji8gZGMfegOa6JHEk1i43RNQhSE4dAr2Qi8fXSr0xFpivMR3BFbJdi47owohbss+HAbZagrmnExKeLQ8ZaY1LoVb3rcvl3rYoSfTs5AGo4JKbe2MWaeqFmpjkGiHXwmMss6A3H9ZUSci9Ii4qRPDw4MBeXpFLlcBxlEzq94cTu1D4hl3iZV4wtEHkI+kzYtayIvrAGDHdpVHBN6iQbTd3QbbitHZkmwt22XRk5NDrWZp3+CDUZT0y2vURX40LJk2dK8wu8GZ+uHhu4/bh5Paj4/XX356/PE/fv7xfz3++NPT5/35i7l9/PyLfnmYj+P9Pi57a0dzXsvP6tn79VSc2Jh9Gae9jakqu13MVkm7PbKyd9vJWtTGY1g9oeWGW3GTOvcwIss0koZqmlFVGts2aiu2Yitv6k17IWHFve2i6UHvzIEKBsIUtS7BlrL3KEpGAiGvmIWWJRzRLvqMsOXGCZO2kQ1xBpVZttmSMtdIQDZLJPae1DD01FRLlfm7hVAP3C3UVZyU9C6q7UNqlhq3IO8/BLdDfdsOXaUY0mworfgSk0e2VSX7yImLEW7aJDaabNvGzISgilpBE0mqwlSmU7CcJzzEiB0mwam2pJXJxBva2OCk2RmjlnBB1KovbJ22dt/2BosuLZZVY6LEyCrNtqFRu7u99yER0WA6u0jtaBEbW2hiTXsiVe4fw3aNKioxtxpVeaXBWcVJUizGFqojyCzBtUVVtkmS0wBINUTbCAnJCRkcNeHfICwEHLmFozMbI+X+JzsCMz3b3WNsko2iD2kpXZKke8RQScigJDSGNVTcBiFWX0vChPEVDIWzIBR4WSTQBq0lX4xwGqva3abXKxa0T+8mV7Ju3dOeZkq7mdYExsCzzkM3V9dE47SpNHFfdzXS6H2HYkhVdaoiRAONNKCqjKutKoEwUhVjbCyRUFVEmiypR5oloXWRV4kPpRiuQGrHSabbrjpyGkC5t0PAytpHdVvM07lqG3Sr56ePH7759sM333x/fveutotrdOk6W1sZBsCklOcsa3ps7LfJOI0qqPZoCtWoDY1tOwmNGmuDcVKikcKRhI0yH9lQVUhltmaaghrq7rCoUttSYati/xorkmCp3VWyyfNuOo0abrt7LzOx3e0eaArT3W48YTkcdlaD2DGYJI0WyDE1cObS+06ke3mbES6y7gOhxGXHD9eGhN3BERuNlu+B5rBkLxoAuw44aHldxXXgCyMrT3ooIcQx+nSyF524u4c929frXtIQ0La7x/V6u5zm/xfUgUqwB3pIhQAAAABJRU5ErkJggg==', 'refresh_worker': True}
comfyui-worker | DEBUG  | test-978b2b1c-ce09-4849-8b7d-3fd318f016c2 | run_job return: {'output': {'status': 'success', 'message': 'iVBORw0KGgoAAAANSUhEUgAAAvgAAAL4CAIAAACBQBS0AAEAAElEQVR4Xoz97XLkOrIlCq7lDpIRkjJz711V3T028zjzIPPiM2Yz93afU1/7IzOlCBJwX/eHgyFlnWqzQaYUJOBw+DccABni//v/9f/8RKaSUJIGImUAAEkASUICIBCEwLomCwgCANTdvJnXBAEBBPWxoUrheYAI4AQSgLMvAHFC/+9LkYqJZ16/U1bk/0vhHFmPawKSAaRIQCJASBAIzvbqSkAkU5z0AYCsBitmHkOKLJZIKQlKUxw8wXT+TExZPw/Ec1icbJbUH2j4qJkdoBqytFX1AiiBOhVDMIspFs7Srqo7AUgqrQMljTzRA5RkczwWZAnq5EMo0QGApgEUJqJopjIBFB0AjUpBJHIyNsmDBGUSSTIj7Rxxyj6kBAREUkpl1tgjlchUjlSmhD4iIyNjCEfkGDmUh7SnurJLR+oQjswgDqALQ5mc9K6Lr0sDFFKX7j26cQf2zA4M4B4aUAcG0VNDCCKIkUggUmkckoTIFJmCABojlERKghIsOZ/qxRQuEZpWQEKSkwYaaaQBRhIwUkKrG8mMkBa3ZWmLt21ZFvOleXNz0pyQooc1pgQgM0SMVIQiIqUgMtSPXtcSRg+RWUqQoKKcDyPU/DnVX5yQAFFmwgeHJECSRDFS3EHwYiHTjUiZ0QADmhOCAQYZSZKSEQYagJR7CYSUzMwIigY5QdJhRjnVSE+Y1Mhm5mAzNGVLrMAKLMwVWMwWw+axGS5Md2xNW8NiWhrcRmMakpQhHQkBhBMzigJGIAAgSEkEJKQIICSCScAIMslw66sf1/bdl9fL5dv68v3pz39sP31rL9/t86/5/Lc7//Z9//Xb/sevR3+l99ze/viJ40kjxiHrr9/+oNk9xhF3WCSQJigryCtVqklSLDckyg7BqaWpLUIEyYqRRgAESataY1VWOIG7Tc0+tD3bSLOqF0QaAAjkx+HmLSpGFCUTDvUJ1ugAxIKvHyOBGqHwNIKCmRmNRpM500kTnEbCjEa46KQDC5sDq3EBF3IBNuNGXMmr4Wq2URt1Ma1MRzpzMZkFDQYQMgKAQyTMUFomRINSVkwYytxNMy5O1knV9fyYTRRFETyBP8gEoGYPgCV0kvqAARLJChUosCqF5ryrHgLeNVHy/YjqpG5efSwq7VfjWQdMrKWtE/B/X/4LihqfgIqYB0oJj0QBqgkCEKgHClrIK4oGPczvm7UXiSk3F+QSdMpPMvKc+DAHrSkRmJ4xr3/k4EFvDT5rTpjTbucdcbYRJ3LM24J4r5rlga1YRF0+bvjet4rq50ciAUxaWH2JkhpJmVhWUmDExC2JxZMA1Nw91UcCsql0AO/iedAKqeyW/EBNzQHnLT+o+4P1V3m34oefPIgB+chgCnL2L4AHfr7fEcAZHN5ra7RSXOUZQNkagYrdgihMT6tOxQNF6cxp3oX1YTygcEuoIOqQZrIDnRn2SWelm4BEiURFZKch9c59qkgRQCMEpFFQBABIczAwlQL0uM4UlUBKMCiUUBKAaPxBgUUrITIkNwJISc6ZxJIgJNGnBPUuMwKiQYL81BcFQ0oyFkklWQJ5Ok0KsBJt0fSQYFnmKX0DqDOgCKAoQDXpl2TMaaQb3WlOOqyxEqBZTClFpqBQZmhIEZnSSI3MEVmpYiYyM4EUlKWclCRhMlKMC2XY0xaAEtNDIhCmnYCcUx9UfYpqo5Rz5VBBCQBB46jVAiAiKQPMICIkA6whZw5UASVZWRFhhIOQqHTASC/TAjxoAJUNauAiuHIhVnIBF9PWcDFeiIW5mC6NT43bgs25mq1kc4FGyioDRNZKpQzEDACdSFApEiYIcrFkVL+MBkV2ueGyEeMQ3pK/h8TLCIvPC/np+Xp9+unzpz9+bm83P+49337u99vv+w3jvrD7p+e433ncWjSijx7SkJIKg5JZgjYiSYCay4qy3BIzUHPktCtWrlO
comfyui-worker | ...TRUNCATED 923409 CHARACTERS...
comfyui-worker | N2ZbeC3vVJgp2/K0XY3pMadMVc+eqMbA7r0Zpqq72yoxqXhhCyS7u7tG7utpICRps9tesYA8xWoxl8nEWtxOWiOva0tWYnxbWxj2ThzYwhgBpZKCaVnrsidJLijlIkiMrlWlkk1VjSQ8XvOH8RgSMVwHTWVOtpT5vhaNIM8e4+Qxujar5myO/Ke7pTG7jZG6e29Nt/K+V7Bx05arKCWaGbHusBnWptzYjpqShlw9ZdRtWarSGFahkeetahsJEapCIntFUo0hRDhU29QQKjVztjw1tti6ZDQo3NFIvJYVRs2rjhw5cxgohLBMWABCUglbldxTgFTAkiKRsSAdDIoEJbFSflGShI2U4VcpSbKtOA1AzCLw3WC7cbB3+MK5gGVpDvWpJjWnXex7zVNN17RuO13iNOY+eow5Rm8buEz1Xm5UjA1p1HoP4L637XbXkLr3fbetE7g1ynNqs0Qh3MqUUmFLlJDAUexKDA+lgQuBpx0JSdiOBwA5laM3bNcYxtO4XaWeXaJNVWGzlSTN7u7axlZVnhs+v/tw+fDN+4/faLu0xu5xnTrVSVZpV7cLz5qmKFTFtk9Rpzm3a2/t86jLw/ZOp0trwDifRpKUURpVzJiUS8uS2hYawmZUtZ2501KE0TZyjWWIlRmzanYPqQazjXsgeRaMnid067lPX7Clfc7ZtHDp1r0nFqF97tOY+GMfMzZek7csjmoh7oewpBybji8gZGMfegOa6JHEk1i43RNQhSE4dAr2Qi8fXSr0xFpivMR3BFbJdi47owohbss+HAbZagrmnExKeLQ8ZaY1LoVb3rcvl3rYoSfTs5AGo4JKbe2MWaeqFmpjkGiHXwmMss6A3H9ZUSci9Ii4qRPDw4MBeXpFLlcBxlEzq94cTu1D4hl3iZV4wtEHkI+kzYtayIvrAGDHdpVHBN6iQbTd3QbbitHZkmwt22XRk5NDrWZp3+CDUZT0y2vURX40LJk2dK8wu8GZ+uHhu4/bh5Paj4/XX356/PE/fv7xfz3++NPT5/35i7l9/PyLfnmYj+P9Pi57a0dzXsvP6tn79VSc2Jh9Gae9jakqu13MVkm7PbKyd9vJWtTGY1g9oeWGW3GTOvcwIss0koZqmlFVGts2aiu2Yitv6k17IWHFve2i6UHvzIEKBsIUtS7BlrL3KEpGAiGvmIWWJRzRLvqMsOXGCZO2kQ1xBpVZttmSMtdIQDZLJPae1DD01FRLlfm7hVAP3C3UVZyU9C6q7UNqlhq3IO8/BLdDfdsOXaUY0mworfgSk0e2VSX7yImLEW7aJDaabNvGzISgilpBE0mqwlSmU7CcJzzEiB0mwam2pJXJxBva2OCk2RmjlnBB1KovbJ22dt/2BosuLZZVY6LEyCrNtqFRu7u99yER0WA6u0jtaBEbW2hiTXsiVe4fw3aNKioxtxpVeaXBWcVJUizGFqojyCzBtUVVtkmS0wBINUTbCAnJCRkcNeHfICwEHLmFozMbI+X+JzsCMz3b3WNsko2iD2kpXZKke8RQScigJDSGNVTcBiFWX0vChPEVDIWzIBR4WSTQBq0lX4xwGqva3abXKxa0T+8mV7Ju3dOeZkq7mdYExsCzzkM3V9dE47SpNHFfdzXS6H2HYkhVdaoiRAONNKCqjKutKoEwUhVjbCyRUFVEmiypR5oloXWRV4kPpRiuQGrHSabbrjpyGkC5t0PAytpHdVvM07lqG3Sr56ePH7759sM333x/fveutotrdOk6W1sZBsCklOcsa3ps7LfJOI0qqPZoCtWoDY1tOwmNGmuDcVKikcKRhI0yH9lQVUhltmaaghrq7rCoUttSYati/xorkmCp3VWyyfNuOo0abrt7LzOx3e0eaArT3W48YTkcdlaD2DGYJI0WyDE1cObS+06ke3mbES6y7gOhxGXHD9eGhN3BERuNlu+B5rBkLxoAuw44aHldxXXgCyMrT3ooIcQx+nSyF524u4c929frXtIQ0La7x/V6u5zm/xfUgUqwB3pIhQAAAABJRU5ErkJggg=='}, 'stopPod': True}
comfyui-worker | INFO   | test-cc0930c0-8dcd-450d-b963-863fa5b290ba | Started.
comfyui-worker | runpod-worker-comfy - API is reachable
comfyui-worker | runpod-worker-comfy - image(s) upload
comfyui-worker | runpod-worker-comfy - image(s) upload complete
comfyui-worker | got prompt
comfyui-worker | runpod-worker-comfy - queued workflow with ID 22e95fe4-daae-4bf0-99dc-cb713b62306d
comfyui-worker | runpod-worker-comfy - wait until image generation is complete
comfyui-worker | Requested to load SDXLClipModel
comfyui-worker | Loading 1 new model
comfyui-worker | DEBUG  | test-cc0930c0-8dcd-450d-b963-863fa5b290ba | Handler output: {'error': 'Error waiting for image generation: [Errno 104] Connection reset by peer'}
comfyui-worker | DEBUG  | test-cc0930c0-8dcd-450d-b963-863fa5b290ba | run_job return: {'error': 'Error waiting for image generation: [Errno 104] Connection reset by peer'}
comfyui-worker | INFO   | test-8e0087b3-08b0-4b94-a012-f7b9aa3f3964 | Started.
comfyui-worker | runpod-worker-comfy - Failed to connect to server at http://127.0.0.1:8188 after 500 attempts.
comfyui-worker | runpod-worker-comfy - image(s) upload
comfyui-worker | ERROR  | test-8e0087b3-08b0-4b94-a012-f7b9aa3f3964 | Captured Handler Exception
comfyui-worker | ERROR  | {
comfyui-worker |     "error_type": "<class 'requests.exceptions.ConnectionError'>",
comfyui-worker |     "error_message": "HTTPConnectionPool(host='127.0.0.1', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd3df1b3a60>: Failed to establish a new connection: [Errno 111] Connection refused'))",
comfyui-worker |     "error_traceback": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\", line 174, in _new_conn\n    conn = connection.create_connection(\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\", line 95, in create_connection\n    raise err\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\", line 85, in create_connection\n    sock.connect(sa)\nConnectionRefusedError: [Errno 111] Connection refused\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\", line 715, in urlopen\n    httplib_response = self._make_request(\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\", line 416, in _make_request\n    conn.request(method, url, **httplib_request_kw)\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\", line 244, in request\n    super(HTTPConnection, self).request(method, url, body=body, headers=headers)\n  File \"/usr/lib/python3.10/http/client.py\", line 1283, in request\n    self._send_request(method, url, body, headers, encode_chunked)\n  File \"/usr/lib/python3.10/http/client.py\", line 1329, in _send_request\n    self.endheaders(body, encode_chunked=encode_chunked)\n  File \"/usr/lib/python3.10/http/client.py\", line 1278, in endheaders\n    self._send_output(message_body, encode_chunked=encode_chunked)\n  File \"/usr/lib/python3.10/http/client.py\", line 1038, in _send_output\n    self.send(msg)\n  File \"/usr/lib/python3.10/http/client.py\", line 976, in send\n    self.connect()\
comfyui-worker | ...TRUNCATED 783 CHARACTERS...
comfyui-worker | t(\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/util/retry.py\", line 594, in increment\n    raise MaxRetryError(_pool, url, error or ResponseError(cause))\nurllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd3df1b3a60>: Failed to establish a new connection: [Errno 111] Connection refused'))\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\", line 134, in run_job\n    handler_return = handler(job)\n  File \"/rp_handler.py\", line 308, in handler\n    upload_result = upload_images(images)\n  File \"/rp_handler.py\", line 134, in upload_images\n    response = requests.post(f\"http://{COMFY_HOST}/upload/image\", files=files)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/api.py\", line 115, in post\n    return request(\"post\", url, data=data, json=json, **kwargs)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/api.py\", line 59, in request\n    return session.request(method=method, url=url, **kwargs)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/sessions.py\", line 589, in request\n    resp = self.send(prep, **send_kwargs)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/sessions.py\", line 703, in send\n    r = adapter.send(request, **kwargs)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/adapters.py\", line 700, in send\n    raise ConnectionError(e, request=request)\nrequests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd3df1b3a60>: Failed to establish a new connection: [Errno 111] Connection refused'))\n",
comfyui-worker |     "hostname": "unknown",
comfyui-worker |     "worker_id": "unknown",
comfyui-worker |     "runpod_version": "1.6.2"
comfyui-worker | }
comfyui-worker | DEBUG  | test-8e0087b3-08b0-4b94-a012-f7b9aa3f3964 | run_job return: {'error': '{"error_type": "<class \'requests.exceptions.ConnectionError\'>", "error_message": "HTTPConnectionPool(host=\'127.0.0.1\', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection object at 0x7fd3df1b3a60>: Failed to establish a new connection: [Errno 111] Connection refused\'))", "error_traceback": "Traceback (most recent call last):\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\\", line 174, in _new_conn\\n    conn = connection.create_connection(\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\\", line 95, in create_connection\\n    raise err\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\\", line 85, in create_connection\\n    sock.connect(sa)\\nConnectionRefusedError: [Errno 111] Connection refused\\n\\nDuring handling of the above exception, another exception occurred:\\n\\nTraceback (most recent call last):\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\\", line 715, in urlopen\\n    httplib_response = self._make_request(\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\\", line 416, in _make_request\\n    conn.request(method, url, **httplib_request_kw)\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\\", line 244, in request\\n    super(HTTPConnection, self).request(method, url, body=body, headers=headers)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1283, in request\\n    self._send_request(method, url, body, headers, encode_chunked)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1329, in _send_request\\n    self.endheaders(body, encode_chunked=encode_chunked)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1278, in endheaders\\n    self._send_output(message_body, encode_chunked=encode_chunked)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1038, in _send_output\\n    self.send(msg)\\n  File \\"/usr/lib/
comfyui-worker | ...TRUNCATED 917 CHARACTERS...
comfyui-worker | t-packages/urllib3/util/retry.py\\", line 594, in increment\\n    raise MaxRetryError(_pool, url, error or ResponseError(cause))\\nurllib3.exceptions.MaxRetryError: HTTPConnectionPool(host=\'127.0.0.1\', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection object at 0x7fd3df1b3a60>: Failed to establish a new connection: [Errno 111] Connection refused\'))\\n\\nDuring handling of the above exception, another exception occurred:\\n\\nTraceback (most recent call last):\\n  File \\"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\\", line 134, in run_job\\n    handler_return = handler(job)\\n  File \\"/rp_handler.py\\", line 308, in handler\\n    upload_result = upload_images(images)\\n  File \\"/rp_handler.py\\", line 134, in upload_images\\n    response = requests.post(f\\"http://{COMFY_HOST}/upload/image\\", files=files)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/api.py\\", line 115, in post\\n    return request(\\"post\\", url, data=data, json=json, **kwargs)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/api.py\\", line 59, in request\\n    return session.request(method=method, url=url, **kwargs)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/sessions.py\\", line 589, in request\\n    resp = self.send(prep, **send_kwargs)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/sessions.py\\", line 703, in send\\n    r = adapter.send(request, **kwargs)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/adapters.py\\", line 700, in send\\n    raise ConnectionError(e, request=request)\\nrequests.exceptions.ConnectionError: HTTPConnectionPool(host=\'127.0.0.1\', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection object at 0x7fd3df1b3a60>: Failed to establish a new connection: [Errno 111] Connection refused\'))\\n", "hostname": "unknown", "worker_id": "unknown", "runpod_version": "1.6.2"}'}
comfyui-worker | INFO   | test-d75801dc-f1ca-45e4-91bc-60fd399c1c61 | Started.
comfyui-worker | runpod-worker-comfy - Failed to connect to server at http://127.0.0.1:8188 after 500 attempts.
comfyui-worker | runpod-worker-comfy - image(s) upload
comfyui-worker | ERROR  | test-d75801dc-f1ca-45e4-91bc-60fd399c1c61 | Captured Handler Exception
comfyui-worker | ERROR  | {
comfyui-worker |     "error_type": "<class 'requests.exceptions.ConnectionError'>",
comfyui-worker |     "error_message": "HTTPConnectionPool(host='127.0.0.1', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd3dedc6d40>: Failed to establish a new connection: [Errno 111] Connection refused'))",
comfyui-worker |     "error_traceback": "Traceback (most recent call last):\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\", line 174, in _new_conn\n    conn = connection.create_connection(\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\", line 95, in create_connection\n    raise err\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\", line 85, in create_connection\n    sock.connect(sa)\nConnectionRefusedError: [Errno 111] Connection refused\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\", line 715, in urlopen\n    httplib_response = self._make_request(\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\", line 416, in _make_request\n    conn.request(method, url, **httplib_request_kw)\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\", line 244, in request\n    super(HTTPConnection, self).request(method, url, body=body, headers=headers)\n  File \"/usr/lib/python3.10/http/client.py\", line 1283, in request\n    self._send_request(method, url, body, headers, encode_chunked)\n  File \"/usr/lib/python3.10/http/client.py\", line 1329, in _send_request\n    self.endheaders(body, encode_chunked=encode_chunked)\n  File \"/usr/lib/python3.10/http/client.py\", line 1278, in endheaders\n    self._send_output(message_body, encode_chunked=encode_chunked)\n  File \"/usr/lib/python3.10/http/client.py\", line 1038, in _send_output\n    self.send(msg)\n  File \"/usr/lib/python3.10/http/client.py\", line 976, in send\n    self.connect()\
comfyui-worker | ...TRUNCATED 783 CHARACTERS...
comfyui-worker | t(\n  File \"/usr/local/lib/python3.10/dist-packages/urllib3/util/retry.py\", line 594, in increment\n    raise MaxRetryError(_pool, url, error or ResponseError(cause))\nurllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd3dedc6d40>: Failed to establish a new connection: [Errno 111] Connection refused'))\n\nDuring handling of the above exception, another exception occurred:\n\nTraceback (most recent call last):\n  File \"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\", line 134, in run_job\n    handler_return = handler(job)\n  File \"/rp_handler.py\", line 308, in handler\n    upload_result = upload_images(images)\n  File \"/rp_handler.py\", line 134, in upload_images\n    response = requests.post(f\"http://{COMFY_HOST}/upload/image\", files=files)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/api.py\", line 115, in post\n    return request(\"post\", url, data=data, json=json, **kwargs)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/api.py\", line 59, in request\n    return session.request(method=method, url=url, **kwargs)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/sessions.py\", line 589, in request\n    resp = self.send(prep, **send_kwargs)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/sessions.py\", line 703, in send\n    r = adapter.send(request, **kwargs)\n  File \"/usr/local/lib/python3.10/dist-packages/requests/adapters.py\", line 700, in send\n    raise ConnectionError(e, request=request)\nrequests.exceptions.ConnectionError: HTTPConnectionPool(host='127.0.0.1', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fd3dedc6d40>: Failed to establish a new connection: [Errno 111] Connection refused'))\n",
comfyui-worker |     "hostname": "unknown",
comfyui-worker |     "worker_id": "unknown",
comfyui-worker |     "runpod_version": "1.6.2"
comfyui-worker | }
comfyui-worker | DEBUG  | test-d75801dc-f1ca-45e4-91bc-60fd399c1c61 | run_job return: {'error': '{"error_type": "<class \'requests.exceptions.ConnectionError\'>", "error_message": "HTTPConnectionPool(host=\'127.0.0.1\', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection object at 0x7fd3dedc6d40>: Failed to establish a new connection: [Errno 111] Connection refused\'))", "error_traceback": "Traceback (most recent call last):\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\\", line 174, in _new_conn\\n    conn = connection.create_connection(\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\\", line 95, in create_connection\\n    raise err\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py\\", line 85, in create_connection\\n    sock.connect(sa)\\nConnectionRefusedError: [Errno 111] Connection refused\\n\\nDuring handling of the above exception, another exception occurred:\\n\\nTraceback (most recent call last):\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\\", line 715, in urlopen\\n    httplib_response = self._make_request(\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connectionpool.py\\", line 416, in _make_request\\n    conn.request(method, url, **httplib_request_kw)\\n  File \\"/usr/local/lib/python3.10/dist-packages/urllib3/connection.py\\", line 244, in request\\n    super(HTTPConnection, self).request(method, url, body=body, headers=headers)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1283, in request\\n    self._send_request(method, url, body, headers, encode_chunked)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1329, in _send_request\\n    self.endheaders(body, encode_chunked=encode_chunked)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1278, in endheaders\\n    self._send_output(message_body, encode_chunked=encode_chunked)\\n  File \\"/usr/lib/python3.10/http/client.py\\", line 1038, in _send_output\\n    self.send(msg)\\n  File \\"/usr/lib/
comfyui-worker | ...TRUNCATED 917 CHARACTERS...
comfyui-worker | t-packages/urllib3/util/retry.py\\", line 594, in increment\\n    raise MaxRetryError(_pool, url, error or ResponseError(cause))\\nurllib3.exceptions.MaxRetryError: HTTPConnectionPool(host=\'127.0.0.1\', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection object at 0x7fd3dedc6d40>: Failed to establish a new connection: [Errno 111] Connection refused\'))\\n\\nDuring handling of the above exception, another exception occurred:\\n\\nTraceback (most recent call last):\\n  File \\"/usr/local/lib/python3.10/dist-packages/runpod/serverless/modules/rp_job.py\\", line 134, in run_job\\n    handler_return = handler(job)\\n  File \\"/rp_handler.py\\", line 308, in handler\\n    upload_result = upload_images(images)\\n  File \\"/rp_handler.py\\", line 134, in upload_images\\n    response = requests.post(f\\"http://{COMFY_HOST}/upload/image\\", files=files)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/api.py\\", line 115, in post\\n    return request(\\"post\\", url, data=data, json=json, **kwargs)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/api.py\\", line 59, in request\\n    return session.request(method=method, url=url, **kwargs)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/sessions.py\\", line 589, in request\\n    resp = self.send(prep, **send_kwargs)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/sessions.py\\", line 703, in send\\n    r = adapter.send(request, **kwargs)\\n  File \\"/usr/local/lib/python3.10/dist-packages/requests/adapters.py\\", line 700, in send\\n    raise ConnectionError(e, request=request)\\nrequests.exceptions.ConnectionError: HTTPConnectionPool(host=\'127.0.0.1\', port=8188): Max retries exceeded with url: /upload/image (Caused by NewConnectionError(\'<urllib3.connection.HTTPConnection object at 0x7fd3dedc6d40>: Failed to establish a new connection: [Errno 111] Connection refused\'))\\n", "hostname": "unknown", "worker_id": "unknown", "runpod_version": "1.6.2"}'}
comfyui-worker | INFO   | test-065b19f8-752e-45ec-8d7f-56a9794683d2 | Started.
vesper8 commented 2 months ago

It seems like it's constantly loading models in and out of memory even though.. as it happens.. i'm using the same nodes and same models for the entire batch.. is keeping the same models loaded in for the whole batch something that's possible, something that we can have control over?