dstackai / dstack

dstack is a lightweight, open-source alternative to Kubernetes & Slurm, simplifying AI container orchestration with multi-cloud & on-prem support. It natively supports NVIDIA, AMD, & TPU.
https://dstack.ai/docs
Mozilla Public License 2.0
1.53k stars 154 forks source link

[Bug]: FAILED_TO_START_DUE_TO_NO_CAPACITY #1994

Open judeleonard opened 1 day ago

judeleonard commented 1 day ago

Steps to reproduce

create a fleet.dstack.yml to provision remote backend from my on_prem server. This was created successfully.

dstack apply -f fleet.dstack.yml

type: fleet
name: model-dev-fleet

placement: any

# The user, private SSH key, and hostnames of the on-prem servers
ssh_config:
  user: my_user
  identity_file: ~/.ssh/id_rsa
  hosts:
    - 33.33.48.1

create another yml config to provision a development server with the provisioned fleet as the backend

dstack apply -f dev_environment.yml


type: dev-environment
name: model-dev-env

#python: "3.11"  
image: dstackai/base:py3.13-0.6-cuda-12.1

ide: vscode

spot_policy: auto

Actual behaviour

I got the below error

All provisioning attempts failed. This is likely due to cloud providers not having enough capacity. Check CLI and server logs for more details.

Then also tried to see extra details about the error with the below command

dstack ps --verbose

output below

NAME           BACKEND  REGION  INSTANCE  RESOURCES  SPOT  PRICE  STATUS  SUBMITTED   ERROR                                
 model-dev-env                                                     failed  54 sec ago  JOB_FAILED                           
                                                                                       (FAILED_TO_START_DUE_TO_NO_CAPACITY) 

Expected behaviour

Instance provisioning should be completed successfully with a vscode link to my workspace.

dstack version

0.18.22

Server logs

[12:45:55] INFO     dstack._internal.server.services.backends:404 Requesting instance offers from backends: []                                                                                              
[12:45:56] INFO     dstack._internal.server.background.tasks.process_runs:330 run(110058)model-dev-env: run status has changed SUBMITTED -> TERMINATING                                                     
           INFO     dstack._internal.server.services.jobs:283 job(0d49c8)model-dev-env-0-0: job status is FAILED, reason: FAILED_TO_START_DUE_TO_NO_CAPACITY                                                
[12:45:58] INFO     dstack._internal.server.services.runs:739 run(110058)model-dev-env: run status has changed TERMINATING -> FAILED, reason: JOB_FAILED

Additional information

No response

peterschmidt85 commented 1 day ago

@judeleonard please provide dstack fleet list output.

One of the reasons can be that the fleet instance have GPUs while the dev environment doesn't request any.

judeleonard commented 1 day ago

Here is the output

 FLEET            INSTANCE  BACKEND       RESOURCES                   PRICE  STATUS      CREATED     
 model-dev-fleet  0         ssh (remote)  2xCPU, 0GB, 100.0GB (disk)  $0.0   terminated  3 hours ago 
judeleonard commented 1 day ago

@judeleonard please provide dstack fleet list output.

One of the reasons can be that the fleet instance have GPUs while the dev environment doesn't request any.

Yes, I tried to attach a GPU before but I got the same error 'Not having enough capacity' And my remote server actually has both GPU and docker preinstalled

peterschmidt85 commented 1 day ago

Here is the output

It's dstack ps, not dstack fleet list

dstack by default offers only instances that match exactly the resources of the fleet

judeleonard commented 1 day ago

This is the output. Not much details

NAME           BACKEND  REGION  RESOURCES  SPOT  PRICE  STATUS  SUBMITTED   
 model-dev-env                                           failed  31 mins ago 
peterschmidt85 commented 1 day ago

Means the fleet creation wasn't successful.

  1. Please try again to create the fleet, and post here the entire output.
  2. After that, also please post here the output of dstack server

This will help understand why fleet cound't be created

judeleonard commented 1 day ago

dstack-server

This is the dastack webserver after the fleet was created. But I will try again and post the entire output

judeleonard commented 1 day ago

I just recreated the fleet now


 dstack apply -f fleet.dstack.yml 

/usr/lib/python3/dist-packages/paramiko/transport.py:237: CryptographyDeprecationWarning: Blowfish has been deprecated
  "class": algorithms.Blowfish,
 Project        main             
 User           admin            
 Configuration  fleet.dstack.yml 
 Type           fleet            
 Fleet type     ssh              
 Nodes          1                
 Placement      any              

Found fleet model-dev-fleet. Configuration changes detected.
Re-create the fleet? [y/n]: y

 FLEET            INSTANCE  BACKEND       RESOURCES  PRICE  STATUS   CREATED     ERROR 
 model-dev-fleet  0         ssh (remote)             $0.0   pending  20 sec ago        

then my stack server log.

Could this fingerprint be an issue with my ssh user?

                  'RSAKey' object has no attribute 'fingerprint'                                                                                                                                          
[16:00:13] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
[16:00:14] WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:    
                    'RSAKey' object has no attribute 'fingerprint'                                                                                                                                          
[16:00:19] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:    
                    'RSAKey' object has no attribute 'fingerprint'                                                                                                                                          
[16:00:24] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
[16:00:25] WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:    
                    'RSAKey' object has no attribute 'fingerprint'                                                                                                                                          
[16:00:29] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:    
                    'RSAKey' object has no attribute 'fingerprint'                                                                                                                                          
[16:00:34] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
[16:00:35] WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:    
                    'RSAKey' object has no attribute 'fingerprint'                                                                                                                                          
[16:00:39] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:    
                    'RSAKey' object has no attribute 'fingerprint'                                                                                                                                          
[16:00:43] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:    
                    'RSAKey' object has no attribute 'fingerprint'                                                                                                                                          
[16:00:48] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:    
                    'RSAKey' object has no attribute 'fingerprint'                                                                                                                                          
[16:00:52] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:    
                    'RSAKey' object has no attribute 'fingerprint'                                                                                                                                          
[16:00:57] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:    
                    'RSAKey' object has no attribute 'fingerprint'                                                                                                                                          
[16:01:03] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:    
                    'RSAKey' object has no attribute 'fingerprint'                                                                                                                                          
[16:01:08] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:    
                    'RSAKey' object has no attribute 'fingerprint'                                                                                                                                          
[16:01:13] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error:    
                    'RSAKey' object has no attribute 'fingerprint'                                                                                  
peterschmidt85 commented 1 day ago

Oh that's a known issue, it will be fixed in the next release but for now please do pip install paramiko -U and then restart the server, and try again. The issue must be gone

judeleonard commented 1 day ago

Thank you. Will do that

judeleonard commented 12 hours ago

hi @peterschmidt85 , sorry for updating you at this time. Our remote server was undergoing some updates.

So I later tried it after installing paramiko like you suggested and dstack server logs indeed changed from what I had before. This is my log now. I also updated dstack to the latest v0.18.25.

rename3

peterschmidt85 commented 12 hours ago

@judeleonard Now that you've updated paramiko, please show the full output of creating the fleet (both dstack apply and dstack server outputs).

judeleonard commented 12 hours ago

dstack apply -f fleet.dstack.yml


 Project        main             
 User           admin            
 Configuration  fleet.dstack.yml 
 Type           fleet            
 Fleet type     ssh              
 Nodes          1                
 Placement      any              

Found fleet model-dev-fleet. Configuration changes detected.
Re-create the fleet? [y/n]: y

 FLEET            INSTANCE  BACKEND       RESOURCES  PRICE  STATUS   CREATED     ERROR 
 model-dev-fleet  0         ssh (remote)             $0.0   pending  16 sec ago    

dstack apply -f dev_environment.yml

Project                main                                         
 User                   admin                                        
 Configuration          dev_environment.yml                          
 Type                   dev-environment                              
 Resources              2..xCPU, 8GB.., 1xGPU (10GB), 100GB.. (disk) 
 Max price              -                                            
 Max duration           6h                                           
 Spot policy            auto                                         
 Retry policy           no                                           
 Creation policy        reuse-or-create                              
 Termination policy     destroy-after-idle                           
 Termination idle time  5m                                           

Finished run model-dev-env already exists.
Override the run? [y/n]: y
model-dev-env provisioning completed (terminating)
All provisioning attempts failed. This is likely due to cloud providers not having enough capacity. Check CLI and server logs for more details.
peterschmidt85 commented 12 hours ago

@judeleonard

FLEET INSTANCE BACKEND RESOURCES PRICE STATUS CREATED ERROR model-dev-fleet 0 ssh (remote) $0.0 pending 16 sec ago

But what it showed then? Was it successful?

Running anything before the fleet is created doesn't make sense.

Lets try to understand why the fleet isn't created. Need logs for that.

judeleonard commented 12 hours ago
 dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:227 Failed to start instance model-dev-fleet-0 in 600 seconds. Terminating...                                                
[11:50:02] INFO     dstack._internal.server.services.fleets:363 Deleting fleets: ['model-dev-fleet']                                                                                                        
[11:50:09] INFO     dstack._internal.server.background.tasks.process_fleets:72 Automatic cleanup of an empty fleet model-dev-fleet                                                                          
           INFO     dstack._internal.server.background.tasks.process_fleets:78 Fleet model-dev-fleet deleted                                                                                                
[11:50:11] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/riBFx8B321nQ3v0rEhwYqXJBM'] was unsuccessful                                                                  
[11:50:16] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBF21nQ3v0rEhwYqXJBM'] was unsuccessful                                                                  
[11:50:22] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtxQ3v0rEhwYqXJBM'] was unsuccessful                                                                  
[11:50:27] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321nQ3v0rEYqXJBM'] was unsuccessful                                                                  
[11:50:31] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
[11:50:32] WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321nQ3v0XJBM'] was unsuccessful                                                                  
[11:50:37] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321wYqXJBM'] was unsuccessful                                                                  
[11:50:42] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
[11:50:43] WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321wYqXJBM'] was unsuccessful                                                                  
[11:50:48] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321nQ3vXJBM'] was unsuccessful                                                                  
[11:50:50] INFO     dstack._internal.server.services.backends:404 Requesting instance offers from backends: []                                                                                              
[11:50:54] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321nQYqXJBM'] was unsuccessful                                                                  
[11:50:58] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8BwYqXJBM'] was unsuccessful                                                                  
[11:51:04] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321nQ3v0rEBM'] was unsuccessful                                                                  
[11:51:10] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321nQ3v0rJBM'] was unsuccessful                                                                  
[11:51:12] INFO     dstack._internal.server.services.backends:404 Requesting instance offers from backends: []                                                                                              
           INFO     dstack._internal.server.background.tasks.process_runs:330 run(d0951d)model-dev-env: run status has changed SUBMITTED -> TERMINATING                                                     
[11:51:14] INFO     dstack._internal.server.services.jobs:283 job(4f4d88)model-dev-env-0-0: job status is FAILED, reason: FAILED_TO_START_DUE_TO_NO_CAPACITY                                                
[11:51:15] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321nQ3v0qXJBM'] was unsuccessful                                                                  
[11:51:16] INFO     dstack._internal.server.services.runs:739 run(d0951d)model-dev-env: run status has changed TERMINATING -> FAILED, reason: JOB_FAILED                                                    
[11:51:21] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321nQ3XJBM'] was unsuccessful                                                                  
[11:51:26] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321nhwYqXJBM'] was unsuccessful                                                                                                                                                           
[11:55:31] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B3YqXJBM'] was unsuccessful                                                                  
[11:55:36] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
[11:55:37] WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321XJBM'] was unsuccessful                                                                  
[11:55:41] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGv0rEhwYqXJBM'] was unsuccessful                                                                  
[11:55:45] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGthwYqXJBM'] was unsuccessful                                                                  
[11:55:50] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B32YqXJBM'] was unsuccessful                                                                  
[11:55:55] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
[11:55:56] WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8qXJBM'] was unsuccessful                                                                  
[11:56:00] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B32qXJBM'] was unsuccessful                                                                  
[11:56:04] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtxYqXJBM'] was unsuccessful                                                                  
[11:56:09] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8B321wYqXJBM'] was unsuccessful                                                                  
[11:56:14] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
[11:56:15] WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiwYqXJBM'] was unsuccessful                                                                  
[11:56:20] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
[11:56:21] WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtx8qXJBM'] was unsuccessful                                                                  
[11:56:26] INFO     dstack._internal.server.background.tasks.process_instances:217 Adding ssh instance model-dev-fleet-0...                                                                                 
           WARNING  dstack._internal.server.background.tasks.process_instances:281 Provisioning instance model-dev-fleet-0 could not be completed because of the error: Deploy instance raised an error: SSH
                    connection to the jude@1.1.4.1:22 with keys ['SHA256:T59TCqbDm+dzO/rigiBFGtv0rEhwYqXJBM'] was unsuccessful                                                                  
peterschmidt85 commented 12 hours ago

This clearly shows that dstack cannot connect to the instance using the provided key

@judeleonard Can you connect to the same host using the provided key via ssh -i <key path> jude@1.1.4.1?

@un-def Any ideas what could be wrong?

judeleonard commented 11 hours ago

Yes, I can connect to the same server via ssh from my terminal.

judeleonard commented 11 hours ago

This clearly shows that dstack cannot connect to the instance using the provided key

@judeleonard Can you connect to the same host using the provided key via ssh -i <key path> jude@1.1.4.1?

@un-def Any ideas what could be wrong?

The current user I am using actually requires a password to successfully connect to the server. Could this be why?

peterschmidt85 commented 11 hours ago

@judeleonard Yes! This certainly can be a reason.

Screenshot 2024-11-15 at 14 23 55
judeleonard commented 11 hours ago

Okay, let me work on this and try it again.

peterschmidt85 commented 11 hours ago

@judeleonard Also ensure the SSH key is added to ~/.ssh/authorized_keys on the host?

peterschmidt85 commented 11 hours ago

Basically, dstack works only if ssh works without a password.