Closed ns-rse closed 1 year ago
From @willfurnass make sure content aligns with IT Services instructions on HPC Anconda/Python. But make sure the documentation there is correct.
Also from @willfurnass the GPU section could perhaps be its own course in and of itself.
Yes, that page is one of several that mention conda. We also have other pages in that HPC docs site that are relevant
pages on the use of JupyterHub on ShARC
NB you'll notice in those docs that we encourage users to load the CUDA toolkit via a modulefile rather than install it using conda, the rationale being that users may quickly fill home directories if they get CUDA using conda.
I'd like this course to be aligned with local stuff, but I'd rather it remained generic and did not reference Sheffield-specific materials directly.
Also, I'd be really keen to keep the GPU stuff in somehow as I think it's relevant to lots of people doing AI. Wat do you think @EdwinB12?
I'd like this course to be aligned with local stuff, but I'd rather it remained generic and did not reference Sheffield-specific materials directly.
Totally agree; any tweaks to either the HPC docs or this material to ensure greater alignment should be 'behind the scenes'. One of the strengths of this material is its OS/platform independence.
W.r.t. the GPU stuff: I wondered if it could be an optional extra e.g. run the bulk of the workshop on day x then have an extra hour for the CUDA-related bits on day x+1 for just those that want it.
Along the lines of what Will said, it could simply be a quick heads up:
"By the way, Conda can also install Cuda Libraries. CUDA libraries can be used for installing deep learning libraries such as tensorflow."
People can then ask if they want more detail? The main thing to me is that after this course, they can now read the line in the tensorflow installation instructions:
conda install -c conda-forge cudatoolkit=11.2.2 cudnn=8.1.0
and understand what they're actually doing. :)
@ns-rse @bobturneruk - Other feedback I had written down:
conda env list
as soon as we've shown how to create an environment. There is nothing more frustrating than forgetting what your environment is called. This is the conda command I use most. This composite might be good?
graph TD;
subgraph C1["Bob's Computer"]
birdcore["
<b>'Birdcore' Environment</b> <br/>
Python 3.6 <br/>
Pandas 1.0.1 <br>
PySpark 2.4.8
"]
spaceship1["
<b>'Spaceship' Environment</b> <br/>
Python 3.10 <br/>
Pandas 1.3.5 <br/>
Matplotlib 3.5.1"]
end
subgraph C2["Fariba's Computer"]
fishstick["
<b>'Fishstick' Environment</b> <br/>
Python 2.7 <br/>
Numpy 1.14.4 <br>
Matplotlib 2.2.5
"]
spaceship2["
<b>'Spaceship' Environment</b> <br/>
Python 3.10 <br/>
Pandas 1.3.5 <br/>
Matplotlib 3.5.1"]
end
birdcore --> run_spaceship_a["Run Spaceship.py ❌"]
spaceship1 --> run_spaceship_b["Run Spaceship.py ✔️"]
fishstick --> run_spaceship_c["Run Spaceship.py ❌"]
spaceship2 --> run_spaceship_d["Run Spaceship.py ✔️"]
Also:
graph TD;
Create["Create Environment"] -->
Activate["Activate Environment"] -->
use["Share / Update / Run Code"] -->
Delete["Delete Environment"]
The first diagram is ideal!
This too. All for different purposes.
graph TD;
subgraph Default_Channel["Default Channel"]
scipy1["scipy 1.10.0 Package"]
numpy["numpy 1.23.5 Package"]
end
subgraph Conda_Forge["Conda Forge Channel"]
scipy2["scipy 8.4.3 Package"]
kaggle["kaggle 1.5.12 Package"]
end
Notes reflecting on the session that either need addressing or considering...
Renaming
Rename course to Conda environments for effective and reproducible research
Room Details
Need to make the room clearer. This was hidden half way down the Eventbrite page and in turn the email from which the text was copied. A shortcoming of Eventbrite is that when adding a location the address has to be a valid address that Google recognises. It knew about Firth Court but it was not possible to add the room. Thus BIG BOLD LETTERS AT THE TOP INDICATING THE ROOM would make sense.
Over-running
We didn't complete the material in the expected time.
[completed] Getting Started with Conda
Working with Environments
mkdir ~/Desktop/introduction-conda-for-data-scientists/
and I think it caused some confusion that we might have installed the environment there (which wasn't the case). Something along the lines of "We'd like you to create a directory to keep all the work we are going to do in, you're environments aren't stored, so create this directory (Desktop/introduction-for-data-scientists
) although if you are comfortable doing so this can be created anywhere".base
environment very well.conda --help
andconda <action> --help
earlier. They are a useful reference as to how to useconda
.conda search PKGNAME
aspect very well and why you might do this (i.e. to see what versions are available so you can explicitly install them).Create > Activate > Install Packages > Use > Deactivate
. Keep the initial creation of environments to only specify Python versions and not packages.Finding Conda
section under Where do Conda environments live?.Using Packages and Channels
conda-forge
openblas
pytorch
and they may have newer versions or packages that aren't available ondefaults
.My package isn't available on the
defaultschannel! What should I do?
as we have already stated this is a reason for using them. Remove reference topip
here as that is introduced later.tensorflow
as an example package to install fromconda-forge
as these dependencies take a long time to resolve and install and are included in the aims of Section 5 on GPU dependencies anyway. Suggestpolars
(a package similar to Pandas but written in Rust and faster).pytorch
andtorchvision
install. TODO Find another package, perhapsdask
here?Sharing Environments
_Always_ version control your environment files!_
this section should be reworded, first paragraph can be removed and a shorter paragraph at the end along the lines of _By version controlling yourenvironment.yml
file you can recreate your environment and you do not need to version controls the directory under~/miniconda/env/
where the environment is installed.".environment.yml
created by hand to create a virtual environment as it would be more useful to introduce it after we've gone through creating environments automatically. This would in turn require the removal of the environment after creating the file with--from-history
. Should perhaps rename the environment name.~ On re-reading in its current form the first examples only show the structure, but do not instruct people to create the environment so this doesn't need changnig @ns-rse (2023-03-29)pytorch
as an example of specifying channels.prefix
from the automatically generatedenvironment.yml
to make it trasnferable between machines/users.--pruned
from this section add to Additional things that might be useful to know.~ On reflection I think this is useful so have left it in @ns-rse (2023-03-29)pip
as adependency
inenvironment.yml
but then nest a list ofpip
installed packages underneath.GPU dependencies
Didn't get to cover this so material untested
Additional things that might be useful to know
Include this section with FAQs, alternative namespace method of installing packages (
conda-forge::polars
) and other things mentioned above.Add note on activating Conda under PowerShell which it was found required updating the
ExecutionPolicy
as admin (see thread, the key being to do so as a user with admin rights).