alievk / avatarify-python

Avatars for Zoom, Skype and other video-conferencing apps.
Other
16.21k stars 3.95k forks source link

Provisioning a Cloud-based GPU instance #169

Open tgmerritt opened 4 years ago

tgmerritt commented 4 years ago

If you would like to run the remote worker on the cloud - you can follow this guide.

This guide uses AWS, but other clouds that support GPU resources should more or less have a similar flow.

  1. Login to your AWS account (if you don't have any account - make one. You will be asked for a credit card). You will start in the N Virginia region (us-east-1), this should be fine.

  2. In the top left, select Services then search for EC2 and click

  3. Select the orange Launch Instance button and the "Launch Instance" from the drop-down

  4. Locate the search field in the new page and search for 'ubuntu'

  5. You should see search results like the following: image

  6. Click the blue Select button on the right side of the Ubuntu 18.04 image

  7. Find the All Instance Types drop down and click it - then choose GPU Instances to filter the list

  8. Click the box next to the g4dn.xlarge option (You must have vCPU capacity in order to use this instance type - if you get to the end of this process and an error message from Amazon says you only have 0 vCPU available - you're literally going to have to open a support request to have them increase this limit - that process might take 12 hours for them to respond fully - just follow their link to open a support case and say "I want 4 vCPU" and wait)

  9. Click Next: Configure Instance Details

  10. Leave everything on this screen at defaults unless you know what you want to change and why.

  11. Click Next: Add Storage

  12. Change the size of the volume to at least 16 GB: image

  13. Click Review and Launch

  14. You will see a page about your instance not being eligible for free tier - this is fine. Click Launch (Using this instance type on AWS will cost you money - know that right now if you want to do this 100% free, you cannot. Stop here and go get a coffee)

  15. You will see a window about SSH keys - select new keypair, give the key a name (any name is fine) and click the accept box. The SSH pem key file will download to your machine. image

  16. Click Launch

  17. It will take less than a minute for your instance to be up. You'll see a Public DNS address with the hostname of your instance: image

  18. You can copy the hostname by clicking the two little squares that appear at the end when you mouse over it (you can highlight the whole thing and cmd & c it also, but the icon is convenient!)

  19. You can now SSH into that machine (do not forget to chmod 400 your_pem_file.pem because AWS won't allow you to SSH in with open permissions)

  20. Here is a list of commands to run against the machine to get it going - you will be prompted at certain points (Continue? [Y/n] type stuff). Make sure you run the commands one at a time and agree to the prompts when necessary. The Amazon Guide for installing NVIDIA drivers provides a good explanation.

  21. You'll need to add an IAM user to this EC2 instance in order to copy the NVIDIA driver files from AWS S3 to your machine. For that you'll also need the aws cli installed. These commands are in the GIST above, but what they don't cover is actually creating the IAM user It's fairly straight forward to aws configure once you have the aws cli installed and add the access key and secret key for your IAM user.

  22. Lastly - you'll need to allow TCP and UDP from your IP Address to your EC2 instance: image

  23. Click the Launch Wizard link in your instance, then click the Security Group listed - with it's long random ID.

  24. Finally click image

  25. You'll need to add a rule for TCP 5556 where the IP is My IP and Amazon will automatically find your IP address if you select this. Personally I opened all TCP and UDP from my IP to keep it simple - but you only need the single port above.

E3V3A commented 4 years ago

Thanks for taking your time for this Really great write up. I will try this, and will most likely have more questions about it.

E3V3A commented 4 years ago

Of course I was too trigger happy and could not wait for 4 hours, so after I got presented with the following message:

launch_fail_region_validation


I had to try it again and with a different region, only to now get a 2nd error message:


launch_fail_vcpu_capacity

So this sucked, because I was hoping this would just work after having spent 2 hours on this 15 min task... :(

E3V3A commented 4 years ago

BTW. @tgmerritt
I'm writing an installation script based on your gist...

tgmerritt commented 4 years ago

I did totally call out in the write-up that this exact thing would happen :P

It is lame - I'm not sure why they have this manual control in place.

On Tue, May 12, 2020 at 4:25 PM E:V:A notifications@github.com wrote:

BTW. @tgmerritt https://github.com/tgmerritt I'm writing an installation script based on your gist...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alievk/avatarify/issues/169#issuecomment-627606488, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJQFKEW4MOHYNPRGZAIIQTRRG5C7ANCNFSM4M57ODRA .

-- Tyler Merritt

Furuikeya, Kaeru tobikomu. Mizu no oto

E3V3A commented 4 years ago

Can you post the full screen shots of the steps 16-18 (Public DNS). I deleted some "security groups" that seem to have been created each time I tried to run something, but now I can't find that PublicDNS thing...

tgmerritt commented 4 years ago

There is no step to complete here - once the instance launches, AWS will automatically assign a Public DNS value. You shouldn't have to click / interact / wait / do anything at all. Definitely don't delete security groups that AWS adds.

On Wed, May 13, 2020 at 3:13 AM E:V:A notifications@github.com wrote:

Can you post the full screen shots of the steps 16-18 (Public DNS). I deleted some "security groups" that seem to have been created each time I tried to run something, but now I can't find that PublicDNS thing...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alievk/avatarify/issues/169#issuecomment-627825527, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJQFKAFEUAYNE7G6N77PPTRRJJEHANCNFSM4M57ODRA .

-- Tyler Merritt

Furuikeya, Kaeru tobikomu. Mizu no oto

E3V3A commented 4 years ago

@tgmerritt Can you make a screenshot of your Dashboard? This is getting absurd, I have like 20 support requests... And they get verified, and accepted, but then the instances never show up.

E3V3A commented 4 years ago

@tgmerritt Can you explain where to obtain the input needed from:

aws configure

# Asking for: 
# AWS Access Key ID [None]:
# AWS Secret Access Key [None]:
# Default region name [None]:
# Default output format [None]:
E3V3A commented 4 years ago

I get an error when trying to get the nvidia via aws-cli:

(avatarify) ubuntu@ip-xxxxxx:~$ 
aws s3 cp --recursive s3://nvidia-gaming/linux/latest/ .

fatal error: Could not connect to the endpoint URL: "https://nvidia-gaming.s3.eu-north-1a.amazonaws.com/?list-type=2&prefix=linux%2Flatest%2F&encoding-type=url"
tgmerritt commented 4 years ago

https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users_create.html

I linked to this article from the write-up I thought.

On Thu, May 14, 2020 at 2:59 PM E:V:A notifications@github.com wrote:

@tgmerritt https://github.com/tgmerritt Can you explain where to obtain the input needed from:

aws configure

Asking for: # AWS Access Key ID [None]:# AWS Secret Access Key [None]:# Default region name [None]:# Default output format [None]:

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alievk/avatarify/issues/169#issuecomment-628856510, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJQFKDS55VQQHI5CIVXK7LRRRESZANCNFSM4M57ODRA .

-- Tyler Merritt

Furuikeya, Kaeru tobikomu. Mizu no oto

tgmerritt commented 4 years ago

It's because aws isn't authenticated I would bet. You need the credentials.

On Thu, May 14, 2020 at 3:37 PM E:V:A notifications@github.com wrote:

I get an error when trying to get the nvidia via aws-cli:

(avatarify) ubuntu@ip-xxxxxx:~$ aws s3 cp --recursive s3://nvidia-gaming/linux/latest/ .

fatal error: Could not connect to the endpoint URL: "https://nvidia-gaming.s3.eu-north-1a.amazonaws.com/?list-type=2&prefix=linux%2Flatest%2F&encoding-type=url"

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alievk/avatarify/issues/169#issuecomment-628873781, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJQFKGCA2VZ2BADXLAG3TTRRRJADANCNFSM4M57ODRA .

-- Tyler Merritt

Furuikeya, Kaeru tobikomu. Mizu no oto

E3V3A commented 4 years ago

I got the IAM user creds ok. Only thing funny was that error above. (Running -debug was giving a bunch of Python errors.)

But now I think it may have something to do with my machine not being supported for CUDA? I was using Canonical, Ubuntu, 20.04 LTS, amd64 focal image build on 2020-04-23...

Or maybe it was because I was running under avatarify conda active env?

tgmerritt commented 4 years ago

That's not the AMI that I used - and I see "amd64" which could mean the GPU is non-NVIDIA, I'm not entirely sure about the contents of the image, but search for "Ubuntu" and grab the 18.04 version x64 - that's what I used and it worked just fine.

On Thu, May 14, 2020 at 5:05 PM E:V:A notifications@github.com wrote:

I got the IAM user creds ok. Only thing funny was that error above. (Runnign -debug was giving a bunch of Python errors.) But now I think it may have something to do with my machine not being supported for CUDA? Or maybe it was because I was running under avatarify conda env?

I was using Canonical, Ubuntu, 20.04 LTS, amd64 focal image build on 2020-04-23...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/alievk/avatarify/issues/169#issuecomment-628910584, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAJQFKA7JEUXCB4OCVLBHMLRRRTKNANCNFSM4M57ODRA .

-- Tyler Merritt

Furuikeya, Kaeru tobikomu. Mizu no oto

E3V3A commented 4 years ago

It should be ok... NVIDIA Corporation TU104GL [Tesla T4] (rev a1) But anyway, I deleted the instance and try to install from scratch again, hopefully without errors...

tgmerritt commented 4 years ago

Yep that looks like it should be ok...

Sent from my iPhone

On May 14, 2020, at 5:43 PM, E:V:A notifications@github.com wrote:

 It should be ok... NVIDIA Corporation TU104GL [Tesla T4] (rev a1) But anyway, I deleted the instance and try to install from scratch again, hopefully without errors...

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.