RSE-Sheffield / conda-environments-for-effective-and-reproducible-research

Introduction to Conda for (Data) Scientists
https://rse.shef.ac.uk/conda-environments-for-effective-and-reproducible-research/
Other
0 stars 2 forks source link

Reflection 2023-03-03 #25

Closed ns-rse closed 1 year ago

ns-rse commented 1 year ago

Notes reflecting on the session that either need addressing or considering...

Renaming

Rename course to Conda environments for effective and reproducible research

Room Details

Need to make the room clearer. This was hidden half way down the Eventbrite page and in turn the email from which the text was copied. A shortcoming of Eventbrite is that when adding a location the address has to be a valid address that Google recognises. It knew about Firth Court but it was not possible to add the room. Thus BIG BOLD LETTERS AT THE TOP INDICATING THE ROOM would make sense.

Over-running

We didn't complete the material in the expected time.

[completed] Getting Started with Conda

Working with Environments

Using Packages and Channels

Sharing Environments

GPU dependencies

Didn't get to cover this so material untested

Additional things that might be useful to know

Include this section with FAQs, alternative namespace method of installing packages (conda-forge::polars) and other things mentioned above.

Add note on activating Conda under PowerShell which it was found required updating the ExecutionPolicy as admin (see thread, the key being to do so as a user with admin rights).

ns-rse commented 1 year ago

From @willfurnass make sure content aligns with IT Services instructions on HPC Anconda/Python. But make sure the documentation there is correct.

Also from @willfurnass the GPU section could perhaps be its own course in and of itself.

willfurnass commented 1 year ago

Yes, that page is one of several that mention conda. We also have other pages in that HPC docs site that are relevant

bobturneruk commented 1 year ago

I'd like this course to be aligned with local stuff, but I'd rather it remained generic and did not reference Sheffield-specific materials directly.

bobturneruk commented 1 year ago

Also, I'd be really keen to keep the GPU stuff in somehow as I think it's relevant to lots of people doing AI. Wat do you think @EdwinB12?

willfurnass commented 1 year ago

I'd like this course to be aligned with local stuff, but I'd rather it remained generic and did not reference Sheffield-specific materials directly.

Totally agree; any tweaks to either the HPC docs or this material to ensure greater alignment should be 'behind the scenes'. One of the strengths of this material is its OS/platform independence.

willfurnass commented 1 year ago

W.r.t. the GPU stuff: I wondered if it could be an optional extra e.g. run the bulk of the workshop on day x then have an extra hour for the CUDA-related bits on day x+1 for just those that want it.

EdwinB12 commented 1 year ago

Along the lines of what Will said, it could simply be a quick heads up:

"By the way, Conda can also install Cuda Libraries. CUDA libraries can be used for installing deep learning libraries such as tensorflow."

People can then ask if they want more detail? The main thing to me is that after this course, they can now read the line in the tensorflow installation instructions:

conda install -c conda-forge cudatoolkit=11.2.2 cudnn=8.1.0 and understand what they're actually doing. :)

EdwinB12 commented 1 year ago

@ns-rse @bobturneruk - Other feedback I had written down:

  1. An example or diagram for demonstrating motivation for conda/virtual environments. Lots of words on the first page, but maybe like this or first pic from here
  2. Introduce conda env list as soon as we've shown how to create an environment. There is nothing more frustrating than forgetting what your environment is called. This is the conda command I use most.
bobturneruk commented 1 year ago

This composite might be good?

graph TD;
    subgraph C1["Bob's Computer"]
        birdcore["
            <b>'Birdcore' Environment</b> <br/> 
            Python 3.6 <br/> 
            Pandas 1.0.1 <br>
            PySpark 2.4.8
            "]

        spaceship1["
            <b>'Spaceship' Environment</b> <br/>
            Python 3.10 <br/>
            Pandas 1.3.5 <br/>
            Matplotlib 3.5.1"]        
    end

    subgraph C2["Fariba's Computer"]
        fishstick["
            <b>'Fishstick' Environment</b> <br/> 
            Python 2.7 <br/> 
            Numpy 1.14.4 <br>
            Matplotlib 2.2.5
            "]

        spaceship2["
            <b>'Spaceship' Environment</b> <br/>
            Python 3.10 <br/>
            Pandas 1.3.5 <br/>
            Matplotlib 3.5.1"]     
    end

    birdcore --> run_spaceship_a["Run Spaceship.py ❌"]
    spaceship1 --> run_spaceship_b["Run Spaceship.py ✔️"]
    fishstick --> run_spaceship_c["Run Spaceship.py ❌"]
    spaceship2 --> run_spaceship_d["Run Spaceship.py ✔️"]
bobturneruk commented 1 year ago

Also:

graph TD;
    Create["Create Environment"] --> 
    Activate["Activate Environment"] --> 
    use["Share / Update / Run Code"] --> 
    Delete["Delete Environment"]
EdwinB12 commented 1 year ago

The first diagram is ideal!

bobturneruk commented 1 year ago

This too. All for different purposes.

graph TD;
    subgraph Default_Channel["Default Channel"]
        scipy1["scipy 1.10.0 Package"]
        numpy["numpy 1.23.5 Package"]
    end
    subgraph Conda_Forge["Conda Forge Channel"]
        scipy2["scipy 8.4.3 Package"]
        kaggle["kaggle 1.5.12 Package"]
    end