Closed berkelmas closed 9 months ago
Can you show me exactly what you have in your template settings? I don't need the AWS stuff so please exclude any API keys.
Are the runners correctly pulling the docker image and do you have SERVERLESS=true
?
Yeah sure. The above are my template settings and env variables. (AWS credentials are below and I did not include them in the screenshot) @robballantyne and this is my forked repo:
https://github.com/berkorg/comfyui
and this is the outputted image that I am using:
and now it got up. Maybe there was an outage but I am getting the error below from the endpoint:
{
"delayTime": 5561,
"error": "{'14': {'errors': [{'type': 'value_not_in_list', 'message': 'Value not in list', 'details': \"ckpt_name: 'sd_xl_base_1.0.safetensors' not in []\", 'extra_info': {'input_name': 'ckpt_name', 'input_config': [[]], 'received_value': 'sd_xl_base_1.0.safetensors'}}], 'dependent_outputs': ['9'], 'class_type': 'CheckpointLoaderSimple'}}",
"executionTime": 1155,
"id": "sync-95682cbd-9d22-41e1-bc8e-26366a4ec961-e1",
"status": "FAILED"
}
It says sd_xl_base_1.0.safetensors is not in the list but I have commited that change in my PR and added that to the CHECKPOINT_MODELS tuple: https://github.com/berkorg/comfyui/commit/94c0f0560f9f2f337ed937d2353e3297737a0444
I have checked your fork and I noticed you have trailing commas in the bash arrays inside https://github.com/berkorg/comfyui/blob/main/build/COPY_ROOT_EXTRA/opt/ai-dock/bin/build/layer1/init.sh
Bash arrays are delimited by whitespace rather than a comma. Please remove the commas and re-build.
Ah I see. Now fixed that and deployed a new endpoint on RunPod. But do you think it is normal that workers are trying to fetch the checkpoint? Don't they need to already exist in the docker image?
The worker is only fetching the image to create a container. It only happens when the worker is created and not on each request.
Although my docker images fetch models when running in normal mode, it never happens in serverless mode - If the model is not present in either the container or attached storage it will simply fail as you saw above.
Ah okay. So this is then just a one time thing for the initial container creation for the serverless endpoints then 👍
After I followed the guidance in the issue below, I have changed the IMAGE_BASE to
ghcr.io/ai-dock/jupyter-pytorch:2.1.1-py3.10-cuda-11.8.0-base-22.04
after forking the repository and added my own models/custom-nodes to COPY_ROOT_EXTRA and triggered the GitHub pipeline to build the Docker images. I have used thehttps://github.com/berkorg/comfy-docker/pkgs/container/comfy-docker/156936256?tag=pytorch-2.0.1-py3.10-cuda-11.8.0-base-22.04
image and created a new template in RubPod.But, in RubPod the endpoint cannot initialize itself and does not also log anything. It stays in the state below:
Can you please help me out?