Closed jgammerman closed 1 year ago
When you launch the notebook instance, make sure to specify a machine type with a GPU. You can also stop the instance, add a GPU, and then restart it
thanks, Lak
On Fri, Feb 10, 2023, 8:41 AM James @.***> wrote:
@lakshmanok https://github.com/lakshmanok - regarding my earlier issue (
164
https://github.com/GoogleCloudPlatform/data-science-on-gcp/issues/164), I've ended up manually exporting the data from BQ to cloud storage using the GUI.
The rest of the notebook is working fine, but now I'm training the deep neural network it's awfully slow (I'm still on the first of the 10 epochs and it's not even half way through it after 10 minutes!).
I'm guessing that the problem is that I'm using CPUs rather than a GPU...on p.322 of the book you state "Making sure that the Vertex AI Workbench notebook that I’m working on has a GPU attached to it, I can now launch off the training job..." but if I'm not mistaken it's not covered in the textbook or notebook how to do this?
The GC docs refer to creating a separate CustomJob https://cloud.google.com/vertex-ai/docs/training/configure-compute#create_custom_job_gpus-console to achieve this - is that what you did or is there a quicker way?
— Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/data-science-on-gcp/issues/165, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANJPZ2FRQLPPMDNEJOI2DLWWZVT3ANCNFSM6AAAAAAUYAAVTM . You are receiving this because you were mentioned.Message ID: @.***>
Still no quicker I'm afraid - one epoch is taking about 20 minutes...
My notebook instance now comes with a GPU:
And when starting the notebook I've selected as my kernel Tensorflow 2 (Local), previously it was Python (local) :
I can't see any other options for specifying that my GPU should be used...
Also - I have a really dumb question but it's come up before in this book so I may as well ask it now...
I'd like to see the CPU/GPU usage of the VM that my notebook is running on. In other cloud platforms (eg. Azure) you have to connect the notebook to a VM manually every time, which makes this easy to do.
But in GCP, everything seems to happen in the background and it's not clear how to inspect your VM....if I go to the VM Instances API in the console, it looks like I don't have any:
Please could you advise? Sorry if this is a stupid question but I'm guessing it's not just me who is confused!
(1) If you didn't change the line that says DEVELOP=True in the notebook, each epoch should take less than a minute. By any chance, are the compute on the notebook & the bucket region different? (2) you can look at notebook gpu/cpu usage etc. by click on the notebook name (in the Vertex Workbench area), and selecting the "Monitoring" tab
thanks Lak
On Fri, Feb 10, 2023 at 9:43 AM James @.***> wrote:
Actually I have a really dumb question but it's come up before in this book so I may as well ask it now...
I'd like to see the CPU/GPU usage of the VM that my notebook is running on. In other cloud platforms (eg. Azure) you have to connect the notebook to a VM manually every time, which makes this easy to do.
But in GCP, everything seems to happen in the background and it's not clear how to inspect your VM....if I go to the VM Instances API in the console, it looks like I don't have any:
[image: image] https://user-images.githubusercontent.com/8484188/218159743-09a7e68a-de0b-4fbf-ace3-b8a51d0248c9.png
Please could you advise? Sorry if this is a really stupid question.
— Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/data-science-on-gcp/issues/165#issuecomment-1426135782, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANJPZYEPRVZURWUX6TCQA3WWZ44NANCNFSM6AAAAAAUYAAVTM . You are receiving this because you were mentioned.Message ID: @.***>
1) Oh I changed DEVELOP=True to DEVELOP=False after successfully running 2 epochs very quickly, under a minute as you said. The flow of the Jupyter notebook is somewhat different to the textbook chapter so I thought that was what I was supposed to do - maybe not!
2) Unfortunately I can't see any Monitoring tab, only Logs:
Thanks for these quick response by the way...I'll be sure to mention them in the glowing Amazon review I give of the book once I'm done with it!
Yeah, you don't need to run on the full dataset. You can just try it out on a small sample. In later chapters, I'll have you copy over my model that was trained over the whole thing.
Re: monitoring: you are using managed notebooks rather than the user-managed notebooks that I was using: https://cloud.google.com/vertex-ai/docs/workbench/managed/introduction Part of the control you give up when you ask Vertex AI to manage the notebook lifecycle is that it runs it in a tenant project, so your ability to monitor is limited Think of managed notebooks as being like Google Colab.
Lak
On Fri, Feb 10, 2023 at 10:06 AM James @.***> wrote:
1.
Oh I changed DEVELOP=True to DEVELOP=False after successfully running 2 epochs. The flow of the Jupyter notebook is somewhat different to the textbook chapter so I thought that was what I was supposed to do - maybe not! 2.
Unfortunately I can't see any Monitoring tab, only Logs:
[image: image] https://user-images.githubusercontent.com/8484188/218164225-ad6f5059-dd10-4932-8684-687c1ac75053.png
Thanks for these quick response by the way...I'll be sure to mention them in the glowing Amazon review I give of the book once I'm done with it!
— Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/data-science-on-gcp/issues/165#issuecomment-1426159951, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANJPZ6FK7MZHGZ5DQJAWGDWWZ7R7ANCNFSM6AAAAAAUYAAVTM . You are receiving this because you were mentioned.Message ID: @.***>
I see! Thank you Lak.
@lakshmanok - regarding my earlier issue (#164), I've ended up manually exporting the data from BQ to cloud storage using the GUI.
The rest of the notebook is working fine, but now I'm training the deep neural network it's awfully slow (I'm still on the first of the 10 epochs and it's not even half way through it after 10 minutes!).
I'm guessing that the problem is that I'm using CPUs rather than a GPU...on p.322 of the book you state "Making sure that the Vertex AI Workbench notebook that I’m working on has a GPU attached to it, I can now launch off the training job..." but if I'm not mistaken it's not covered in the textbook or notebook how to do this?
I've already set up my fully-managed notebook to enable an NVIDIA T4 GPU, but I believe that it won't be attached automatically without me doing something else.
The GC docs refer to creating a separate CustomJob to achieve this - is that what you did or is there a quicker way?