CryoInTheCloud / hub-image

Default image for NASA's cryocloud hub
BSD 3-Clause "New" or "Revised" License
7 stars 7 forks source link

Additional packages for UW NASA Hackweek 2024 #118

Open scottyhq opened 1 month ago

scottyhq commented 1 month ago

In prep for https://2024.hackweek.io using CryoCloud we have requests for the following latest package releases in the default environment. This is a tracking issue, might add some more. We can open up separate PRs for each cc @jomey

jomey commented 1 month ago

CC: @micah-prime To coordinate a snowexsql release before the hackweek

micah-prime commented 1 month ago

Linking this issue for getting snowexsql on conda-forge https://github.com/SnowEx/snowexsql/issues/93

scottyhq commented 1 month ago

Adding a request for https://pixi.sh/latest/

(https://github.com/ICESAT-2HackWeek/website-2024/pull/13)

meganmason commented 1 month ago

Adding a request for itables

weiji14 commented 1 month ago

@scottyhq, do you reckon it's ok to add Pytorch (CPU build) into hub-image so that the Hackweek participants can run the notebook at https://github.com/ICESAT-2HackWeek/website-2024/pull/17 without having to install it using a conda install -y pytorch code cell? I'm not sure if we want 10 or 20+ people launching the GPU machine with the Pangeo pytorch-notebook image for an hour-long tutorial.

scottyhq commented 1 month ago

do you reckon it's ok to add Pytorch (CPU build)

Fine with me! At some point it'll probably be necessary to clarify (maybe here https://book.cryointhecloud.com/content/hub_best_practices.html) what is in the default environment (minimal + bring your own environment) or everything in the kitchen sink....

weiji14 commented 1 month ago

Ok, started the PR at #125 to add more packages in. Do we still need pixi given that https://github.com/ICESAT-2HackWeek/website-2024/pull/13 was closed?

tsnow03 commented 1 month ago

PyTorch is kinda huge though isn't it? We've been keeping the ML stuff, especially the more advanced libraries that fewer people use, separate to streamline the time it takes to load the image. So I'm a little more inclined to keep that in the separate image. Our main image should be things that a large swath of our users will be able to use, but kept as small as possible otherwise. Then people can build permanent kernels or, soon, be able to build their own images automatically from an environment/requirements and use those instead.

spestana commented 1 month ago

Adding a request for hydroeval

weiji14 commented 1 month ago

PyTorch is kinda huge though isn't it? We've been keeping the ML stuff, especially the more advanced libraries that fewer people use, separate to streamline the time it takes to load the image. So I'm a little more inclined to keep that in the separate image. Our main image should be things that a large swath of our users will be able to use, but kept as small as possible otherwise. Then people can build permanent kernels or, soon, be able to build their own images automatically from an environment/requirements and use those instead.

The CPU build of Pytorch won't pull in any of the CUDA libraries (which uses up a lot of disk space), but the package (pytorch, libtorch, etc) will still be 100+MB, so I can leave that out of the default image. I can point the Hackweek participants to use the Pangeo pytorch-notebook instead via the 'Bring your own image' option (instead of using the GPU node), will just be a bit more setup at the start but hopefully ok.