Open judeleonard opened 1 day ago
@judeleonard please provide dstack fleet list
output.
One of the reasons can be that the fleet instance have GPUs while the dev environment doesn't request any.
Here is the output
FLEET INSTANCE BACKEND RESOURCES PRICE STATUS CREATED
model-dev-fleet 0 ssh (remote) 2xCPU, 0GB, 100.0GB (disk) $0.0 terminated 3 hours ago
@judeleonard please provide
dstack fleet list
output.One of the reasons can be that the fleet instance have GPUs while the dev environment doesn't request any.
Yes, I tried to attach a GPU before but I got the same error 'Not having enough capacity' And my remote server actually has both GPU and docker preinstalled
Here is the output
It's dstack ps
, not dstack fleet list
dstack by default offers only instances that match exactly the resources of the fleet
This is the output. Not much details
NAME BACKEND REGION RESOURCES SPOT PRICE STATUS SUBMITTED
model-dev-env failed 31 mins ago
Means the fleet creation wasn't successful.
This will help understand why fleet cound't be created
This is the dastack webserver after the fleet was created. But I will try again and post the entire output
dstack apply -f fleet.dstack.yml
/usr/lib/python3/dist-packages/paramiko/transport.py:237: CryptographyDeprecationWarning: Blowfish has been deprecated
"class": algorithms.Blowfish,
Project main
User admin
Configuration fleet.dstack.yml
Type fleet
Fleet type ssh
Nodes 1
Placement any
Found fleet model-dev-fleet. Configuration changes detected.
Re-create the fleet? [y/n]: y
FLEET INSTANCE BACKEND RESOURCES PRICE STATUS CREATED ERROR
model-dev-fleet 0 ssh (remote) $0.0 pending 20 sec ago
Could this fingerprint be an issue with my ssh user?
'RSAKey' object has no attribute 'fingerprint'
[16:00:13] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
[16:00:14] WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:
'RSAKey' object has no attribute 'fingerprint'
[16:00:19] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:
'RSAKey' object has no attribute 'fingerprint'
[16:00:24] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
[16:00:25] WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:
'RSAKey' object has no attribute 'fingerprint'
[16:00:29] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:
'RSAKey' object has no attribute 'fingerprint'
[16:00:34] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
[16:00:35] WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:
'RSAKey' object has no attribute 'fingerprint'
[16:00:39] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:
'RSAKey' object has no attribute 'fingerprint'
[16:00:43] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:
'RSAKey' object has no attribute 'fingerprint'
[16:00:48] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:
'RSAKey' object has no attribute 'fingerprint'
[16:00:52] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:
'RSAKey' object has no attribute 'fingerprint'
[16:00:57] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:
'RSAKey' object has no attribute 'fingerprint'
[16:01:03] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:
'RSAKey' object has no attribute 'fingerprint'
[16:01:08] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:
'RSAKey' object has no attribute 'fingerprint'
[16:01:13] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:
'RSAKey' object has no attribute 'fingerprint'
Oh that's a known issue, it will be fixed in the next release but for now please do pip install paramiko -U
and then restart the server, and try again. The issue must be gone
Thank you. Will do that
hi @peterschmidt85 , sorry for updating you at this time. Our remote server was undergoing some updates.
So I later tried it after installing paramiko like you suggested and dstack server logs indeed changed from what I had before. This is my log now. I also updated dstack to the latest v0.18.25.
@judeleonard Now that you've updated paramiko
, please show the full output of creating the fleet (both dstack apply
and dstack server
outputs).
dstack apply -f fleet.dstack.yml
Project main
User admin
Configuration fleet.dstack.yml
Type fleet
Fleet type ssh
Nodes 1
Placement any
Found fleet model-dev-fleet. Configuration changes detected.
Re-create the fleet? [y/n]: y
FLEET INSTANCE BACKEND RESOURCES PRICE STATUS CREATED ERROR
model-dev-fleet 0 ssh (remote) $0.0 pending 16 sec ago
dstack apply -f dev_environment.yml
Project main
User admin
Configuration dev_environment.yml
Type dev-environment
Resources 2..xCPU, 8GB.., 1xGPU (10GB), 100GB.. (disk)
Max price -
Max duration 6h
Spot policy auto
Retry policy no
Creation policy reuse-or-create
Termination policy destroy-after-idle
Termination idle time 5m
Finished run model-dev-env already exists.
Override the run? [y/n]: y
model-dev-env provisioning completed (terminating)
All provisioning attempts failed. This is likely due to cloud providers not having enough capacity. Check CLI and server logs for more details.
@judeleonard
FLEET INSTANCE BACKEND RESOURCES PRICE STATUS CREATED ERROR model-dev-fleet 0 ssh (remote) $0.0 pending 16 sec ago
But what it showed then? Was it successful?
Running anything before the fleet is created doesn't make sense.
Lets try to understand why the fleet isn't created. Need logs for that.
dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:227 Failed to start instance model-dev-fleet-0 in 600 seconds. Terminating...
[11:50:02] INFO dstack._internal.server.services.fleets:363 Deleting fleets: ['model-dev-fleet']
[11:50:09] INFO dstack._internal.server.background.tasks.process_fleets:72 Automatic cleanup of an empty fleet model-dev-fleet
INFO dstack._internal.server.background.tasks.process_fleets:78 Fleet model-dev-fleet deleted
[11:50:11] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/riBFx8B321nQ3v0rEhwYqXJBM'] was unsuccessful
[11:50:16] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBF21nQ3v0rEhwYqXJBM'] was unsuccessful
[11:50:22] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtxQ3v0rEhwYqXJBM'] was unsuccessful
[11:50:27] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321nQ3v0rEYqXJBM'] was unsuccessful
[11:50:31] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
[11:50:32] WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321nQ3v0XJBM'] was unsuccessful
[11:50:37] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321wYqXJBM'] was unsuccessful
[11:50:42] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
[11:50:43] WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321wYqXJBM'] was unsuccessful
[11:50:48] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321nQ3vXJBM'] was unsuccessful
[11:50:50] INFO dstack._internal.server.services.backends:404 Requesting instance offers from backends: []
[11:50:54] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321nQYqXJBM'] was unsuccessful
[11:50:58] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8BwYqXJBM'] was unsuccessful
[11:51:04] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321nQ3v0rEBM'] was unsuccessful
[11:51:10] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321nQ3v0rJBM'] was unsuccessful
[11:51:12] INFO dstack._internal.server.services.backends:404 Requesting instance offers from backends: []
INFO dstack._internal.server.background.tasks.process_runs:330 run(d0951d)model-dev-env: run status has changed SUBMITTED -> TERMINATING
[11:51:14] INFO dstack._internal.server.services.jobs:283 job(4f4d88)model-dev-env-0-0: job status is FAILED, reason: FAILED_TO_START_DUE_TO_NO_CAPACITY
[11:51:15] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321nQ3v0qXJBM'] was unsuccessful
[11:51:16] INFO dstack._internal.server.services.runs:739 run(d0951d)model-dev-env: run status has changed TERMINATING -> FAILED, reason: JOB_FAILED
[11:51:21] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321nQ3XJBM'] was unsuccessful
[11:51:26] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321nhwYqXJBM'] was unsuccessful
[11:55:31] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B3YqXJBM'] was unsuccessful
[11:55:36] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
[11:55:37] WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321XJBM'] was unsuccessful
[11:55:41] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGv0rEhwYqXJBM'] was unsuccessful
[11:55:45] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGthwYqXJBM'] was unsuccessful
[11:55:50] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B32YqXJBM'] was unsuccessful
[11:55:55] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
[11:55:56] WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8qXJBM'] was unsuccessful
[11:56:00] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B32qXJBM'] was unsuccessful
[11:56:04] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtxYqXJBM'] was unsuccessful
[11:56:09] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321wYqXJBM'] was unsuccessful
[11:56:14] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
[11:56:15] WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiwYqXJBM'] was unsuccessful
[11:56:20] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
[11:56:21] WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8qXJBM'] was unsuccessful
[11:56:26] INFO dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...
WARNING dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtv0rEhwYqXJBM'] was unsuccessful
This clearly shows that dstack cannot connect to the instance using the provided key
@judeleonard Can you connect to the same host using the provided key via ssh -i <key path> jude@1.1.4.1
?
@un-def Any ideas what could be wrong?
Yes, I can connect to the same server via ssh from my terminal.
This clearly shows that dstack cannot connect to the instance using the provided key
@judeleonard Can you connect to the same host using the provided key via
ssh -i <key path> jude@1.1.4.1
?@un-def Any ideas what could be wrong?
The current user I am using actually requires a password to successfully connect to the server. Could this be why?
@judeleonard Yes! This certainly can be a reason.
Okay, let me work on this and try it again.
@judeleonard Also ensure the SSH key is added to ~/.ssh/authorized_keys
on the host?
Basically, dstack
works only if ssh
works without a password.
Steps to reproduce
create a fleet.dstack.yml to provision remote backend from my on_prem server. This was created successfully.
dstack apply -f fleet.dstack.yml
create another yml config to provision a development server with the provisioned fleet as the backend
dstack apply -f dev_environment.yml
Actual behaviour
I got the below error
All provisioning attempts failed. This is likely due to cloud providers not having enough capacity. Check CLI and server logs for more details.
Then also tried to see extra details about the error with the below command
dstack ps --verbose
output below
Expected behaviour
Instance provisioning should be completed successfully with a vscode link to my workspace.
dstack version
0.18.22
Server logs
Additional information
No response