carpentries-incubator / julia-data-workflow

Learn Julia workflows for data-intensive research
https://carpentries-incubator.github.io/julia-data-workflow/
Other
5 stars 2 forks source link

Select software tools #3

Open jd-foster opened 3 years ago

jd-foster commented 3 years ago

While lower in priority to #1 or #2, we should decide on the software mode of delivery for the lesson. Each complete SC or DC lesson requires some setup from the learner in terms of interface (GUI or otherwise), browser or manager to facilitate the lesson ideas. This is should not be (but sometimes is) a major barrier to entry for the learner, so the advantages or drawbacks of the software required should be weighed carefully against getting started easily and eventually enabling good practices.

Here are a few options to consider:

  1. Julia REPL via terminal: most direct and immediate start, can be intimidating as "just a prompt".
  2. Jupyter notebook: harder to setup, but enables better learner feedback loop in process. Adds another layer of "cognitive load" to get through for new learner, learning Jupyter interface and Julia at the same time.
  3. VS Code with Julia extension: again, much more to setup than 1 or 2 initially but may pay off in terms automation of code editing, documentation and plot integration. Might be more natural to those used to a browser/app mode of interface.

Any I've missed?

kescobo commented 3 years ago

A lot of interest in Pluto.jl as an alternative to jupyter. The "every cell must be one scope" thing is a barrier for me, but I can't tell how much that would bother learners. The interactivity of notebooks is a major plus in general, and I think it might be superior for a short workshop.

In classes, I always want students to write real code, so it's worth taking the time to get VS code set up. I worry about locking students in to the abstraction of a notebook, but there's a reason people use them a lot for teaching.

Regardless, I think we need to spend some time with the REPL at the beginning, if for no other reason than package installation and environment setup

jd-foster commented 3 years ago

I've seen good things about Pluto. I haven't used it but I think this may be better for people past the introductory stage. Part of the reason I think this is that it requires knowing how to use Julia to get it set up. That said, this pathway seems pretty good, compared to requiring a conda/python install from scratch if you went the Jupyter route. Yet even when you know all the basic steps, you're looking at 15-20 minutes minimum of setup, provided everything goes well. Reducing friction and barriers to that first "Hello World" Julia instruction is key to engaging the beginner. (Let's do something more interesting than "Hello World" though!)

I agree that the basic Julia download and launch of the REPL should form the beginning of the lesson episodes, with real 'live coding' and getting a feel for the essential interactions with the REPL. And yes, knowing how to do installation! After a few sessions getting comfortable, we could introduce Pluto and/or Jupyter (via IJulia.jl) for interactive tasks like plotting/visualisation. One advantage for Jupyter notebooks is prior familiarity of the learners, though not an assumption that should be made implicitly or explictly in the lesson design.

jd-foster commented 3 years ago

Putting in here some good real-world experience to inform the discussion: https://discourse.julialang.org/t/issues-encountered-teaching-classes-with-julia-for-the-first-time/56553/24

Edit: adding an older but related Discourse thread: https://discourse.julialang.org/t/bof-julia-in-the-classroom-juliacon2018/13125/8

jd-foster commented 3 years ago

It's been a while with other commitments, but keen to start working on this lesson (again). Just today came across DrWatson.jl

a scientific project assistant software. It helps people manage their scientific projects (or any project for that matter).

(And also for x-ref: https://discourse.julialang.org/t/experimental-reproducibility-julia-vs-the-rest/46769/2 )

Without trying it in depth, this strikes me as an excellent basis for an episode or even the majority of the lesson if we explain the rationale for why the package constructs/guides the user workflow as it does.

I'm assuming that the package is sufficiently mature that lesson content won't go stale too quickly...

Also, the developers note that

Please note that DrWatson is not a data management system and there are important things to say about data management too in long-term reproducibility.

kescobo commented 3 years ago

I think DataDeps.jl is really the solution to the data management piece. It's not super straightforward, but I got it to work really well for a recent project. I reviewed the DrWatson JOSS paper, but haven't used it much myself

jd-foster commented 3 years ago

Nice. Thanks for the tip on DataDeps.jl, keeping things minimal like this is good. I'll have a look.

Going slightly off-topic, for more complex code with multiple data sources, I've been using BinaryBuilder.jl lately, and the Sources section has a nice schema for "artifacts" that will track different operating systems within the build.