fofr / cog-comfyui

Run ComfyUI with an API
https://replicate.com/fofr/any-comfyui-workflow
MIT License
462 stars 106 forks source link

Issues with the training script #143

Closed drommerkiller closed 1 week ago

drommerkiller commented 1 month ago

Downloading from Civitai seems to fail to api key error no matter what. Using only HF models the script works until error: cog.server.runner.FileUploadError: Got error trying to upload output files.

Complete error `urllib3.exceptions.SSLError: EOF occurred in violation of protocol (_ssl.c:2426)

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/requests/adapters.py", line 667, in send resp = conn.urlopen( File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/urllib3/connectionpool.py", line 873, in urlopen return self.urlopen( File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/urllib3/connectionpool.py", line 873, in urlopen return self.urlopen( File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/urllib3/connectionpool.py", line 873, in urlopen return self.urlopen( File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/urllib3/connectionpool.py", line 843, in urlopen retries = retries.increment( File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/urllib3/util/retry.py", line 519, in increment raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type] urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='api.svc.internal.us.c.replicate.net', port=443): Max retries exceeded with url: /_internal/file-upload/weights.tar (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2426)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cog/server/runner.py", line 315, in _upload_files return self._file_uploader(output) File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cog/server/runner.py", line 224, in file_uploader return upload_files(output, upload_file=upload_file) File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cog/json.py", line 51, in upload_files return {key: upload_files(value, upload_file) for key, value in obj.items()} File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cog/json.py", line 51, in return {key: upload_files(value, upload_file) for key, value in obj.items()} File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cog/json.py", line 56, in upload_files return upload_file(f) File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cog/server/runner.py", line 220, in upload_file return put_file_to_signed_endpoint( File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cog/files.py", line 61, in put_file_to_signed_endpoint resp = client.put( File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/requests/sessions.py", line 649, in put return self.request("PUT", url, data=data, kwargs) File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/requests/sessions.py", line 589, in request resp = self.send(prep, send_kwargs) File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/requests/sessions.py", line 703, in send r = adapter.send(request, **kwargs) File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/requests/adapters.py", line 698, in send raise SSLError(e, request=request) requests.exceptions.SSLError: HTTPSConnectionPool(host='api.svc.internal.us.c.replicate.net', port=443): Max retries exceeded with url: /_internal/file-upload/weights.tar (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2426)')))

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cog/server/runner.py", line 371, in predict return _predict( File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cog/server/runner.py", line 445, in _predict event_handler.set_output(event.payload) File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cog/server/runner.py", line 258, in set_output self.p.output = self._upload_files(output) File "/root/.pyenv/versions/3.10.14/lib/python3.10/site-packages/cog/server/runner.py", line 320, in _upload_files raise FileUploadError("Got error trying to upload output files") from error cog.server.runner.FileUploadError: Got error trying to upload output files`

drommerkiller commented 1 month ago

Ok, the error might be that that files are over 10gb, but the trainer site is now broken: When you click 'train' it goes to page: Page not found.

drommerkiller commented 1 month ago

Got it working by having less than 10gb model. The "page not found" was fixed after one day.

But one problem stays: motion loras do not work. I have added my custom motion loras as well as standard ones and this is what happens: It says that the my_custom_lora.safetensor is loaded and is green in the log and in right folder. But when the workflow runs, it gives this error: !!! Exception during processing!!! expected str, bytes or os.PathLike object, not NoneType Traceback (most recent call last): File "/src/ComfyUI/execution.py", line 151, in recursive_execute output_data, output_ui = get_output_data(obj, input_data_all) File "/src/ComfyUI/execution.py", line 81, in get_output_data return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True) File "/src/ComfyUI/execution.py", line 74, in map_node_over_list results.append(getattr(obj, func)(**slice_dict(input_data_all, i))) File "/src/ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved/animatediff/nodes_lora.py", line 36, in load_motion_lora if not Path(lora_path).is_file(): File "/root/.pyenv/versions/3.10.14/lib/python3.10/pathlib.py", line 960, in new self = cls._from_parts(args) File "/root/.pyenv/versions/3.10.14/lib/python3.10/pathlib.py", line 594, in _from_parts drv, root, parts = self._parse_args(args) File "/root/.pyenv/versions/3.10.14/lib/python3.10/pathlib.py", line 578, in _parse_args a = os.fspath(a) TypeError: expected str, bytes or os.PathLike object, not NoneType Prompt executed in 3.75 seconds outputs: {}

It thinks that motion_lora is empty/NoneType while it is loaded when it starts to run. If i change the motion lora in the workflow to v2_lora_ZoomOut.ckpt it does two things: In the log it first says: v2_lora_ZoomOut.ckpt is loaded and is green and then it starts to download v2_lora_ZoomOut.ckpt again. I have the v2_lora_ZoomOut.ckpt included in the model so it should not re-download it. And then the workflow runs without errors.

So my guess is that while we can include animate diff motion loras to the model, the script/code does not know how to use them and can only use the ones listed in the weights_manifest. All other files in the model seems to work as they should.

Thanks anyways for building the tool! It's nice to be able to use our own checkpoints.

fofr commented 1 month ago

I think I've fixed the civitai issue in https://github.com/fofr/cog-comfyui/commit/cbfee6d2c36559a3bd4625bfd2ac6c924692cfdb (if a URL already contained a ? it'd lead to a 400 bad request)

fofr commented 3 weeks ago

The >10gb issue is a known problem without any fixes at the moment

fofr commented 1 week ago

Closing this for now, as the motion lora directory has been fixed.