2i2c-org / infrastructure

Infrastructure for configuring and deploying our community JupyterHubs.
https://infrastructure.2i2c.org
BSD 3-Clause "New" or "Revised" License
105 stars 55 forks source link

[Support] catalystproject-latam, plnc: Add GPU profile (1 GPU + 4 GPU option) #4162

Open consideRatio opened 4 weeks ago

consideRatio commented 4 weeks ago

The Freshdesk ticket link

https://2i2c.freshdesk.com/a/tickets/1591

Ticket request type

Feature Request

Ticket impact

🟩 Low

Short ticket description

We should enable the community to start servers with 1 and 4 GPUs respectively:

Related documentation

Additional notes

Practically, I propose a dedicated profile_list entry for GPU, where the machine type + GPU count is determined by a option under the profile.

The image choices etc for this profile isn't specified by the community or me yet, so I propose a call is made and we ask for feedback.

Below are examples of how I think we need to declare kubespawner_override for the options choosing the hardware within the GPU profile

          kubespawner_override:
            # also here should be cpu/memory requests/limit to fill up the node picked
            environment:
              NVIDIA_DRIVER_CAPABILITIES: compute,utility
            node_selector:
              node.kubernetes.io/instance-type: n1-highmem-4
            extra_resource_limits:
              nvidia.com/gpu: "1"

          # .....

          kubespawner_override:
            # also here should be cpu/memory requests/limit to fill up the node picked
            environment:
              NVIDIA_DRIVER_CAPABILITIES: compute,utility
            node_selector:
              node.kubernetes.io/instance-type: n1-highmem-16
            extra_resource_limits:
              nvidia.com/gpu: "4"

(Optional) Investigation results

No response

jmunroe commented 4 weeks ago

Thanks @consideRatio . Your proposal for giving access to appropriately sized CPUs/GPUs looks good to me. Please make the changes.

AIDEA775 commented 2 weeks ago

@consideRatio I have added the GPU options to two new image options. Did I understand your proposal correctly?

Also, the Pulling image "quay.io/pangeo/pytorch-notebook:2024.05.07" phase takes a lot of time.

fvillena commented 2 weeks ago

Can you add the new machines to the "Bring your own image" option please?.

And we don't need Pangeo, because we don't do geoscience. Can you replace those images with Jupyter-pytorch and Jupyter-tensorflow?

I think these are the correct images:

What is the difference between Jupyter and Pangeo? because I spawned a Pangeo notebook, but it looks identical, but with Dask added? (and we'd like to start exploring Dask)

And please make sure cuda is working correctly

Thanks.

fvillena commented 2 weeks ago

And also, the spawn sometimes fails because:

Spawn failed: pod plnc/jupyter-fvillena did not start in 600 seconds!

yuvipanda commented 1 week ago

@AIDEA775 is there something I can do to help move this forward?