amzn / amazon-ray

Staging area for ongoing enhancements to Ray focused on improving integration with AWS and other Amazon technologies.
Apache License 2.0
66 stars 28 forks source link

[autoscaler] Improve documentation for spinning up a Ray cluster with a non-public Docker image #2

Open jennakwon06 opened 3 years ago

jennakwon06 commented 3 years ago

Hello-

Just a suggestion for including a documentation for spinning up a Ray cluster with a non-public Docker image.

You have to add below line to avoid "no basic auth credentials" error from docker pull step of ray up - this wasn't particularly clear from any existing docs under https://docs.ray.io/en/latest/cluster/cloud.html.

initialization_commands:
    - aws ecr get-login-password --region us-west-2 | docker login --username AWS --password-stdin 048211272910.dkr.ecr.us-west-2.amazonaws.com;
igorgad commented 3 years ago

Thanks, I also needed to to set the IamInstanceProfile on node_config. As in here

node_config:
  IamInstanceProfile:
    Arn: arn:aws:iam::XXXXXXXXXX:instance-profile/ray-autoscaler-v1 
vputz commented 5 months ago

As a minor addendum, I hadn't used my AWS cluster in a while and when I brought it up it said

Error response from daemon: pull access denied for [your ecr repo], repository does not exist or may require 'docker login': denied: Your authorization token has expired. Reauthenticate and try again.

ray down and ray up then tried to reuse the existing node, but since the initialization command above only seems to run on the first cluster initialization, I had to "terminate" the head instance and restart the cluster from scratch, and that seemed to pull just fine. I'm not sure if there's a way to have something like initialization_commands that runs every time you ray up or if there's another way to reauthenticate, but it was a (fairly minor) annoyance.