carpentries-incubator / shell-extras

Extra Unix Shell Material
http://carpentries-incubator.github.io/shell-extras/
Other
26 stars 55 forks source link

Define the goals and target audience for this lesson #3

Open wking opened 9 years ago

wking commented 9 years ago

Before we go filling it in with new content (#1), I think we should think about who would be taking this lesson, what we want them to learn, and how we'll advertize those goals to the intended audience. Maybe this should be shell-intermediate (meaning “what to take after shell-novice”)? Or shell-multi-host for interactions between hosts (scp, ssh), and shell-posix for the finer details of the POSIX shell and other POSIX-sepecified features (env, alias, job-control, permissions, man).

Personally, I prefer the shell-{area-you-want-to-learn-about} approach, so learners/instructors can easily find the material they want. And lesson listings can organize the by-area lessons into serial curricula as they see fit.

jainsley commented 9 years ago

When I first started using the shell in a bioinformatics context, the most common tasks I performed were text manipulation (grep/sed/awk) and process automation with cron (which would fit in job control). I think having some lessons geared around those topics would be useful in a lot of contexts.

lexnederbragt commented 9 years ago

+1 for job control (including checking how jobs and a server are doing, e.g. top, free, screen(!)) and text manipulation (awk!). I'd add a bit more on automation using find + xargs and GNU parallel. Some more on shell scripting would be nice (set -e, set -u, set -o pipefail, for example) but with the warning that as soon as things start becoming more complicated, consider using a scripting language. Use of boolean operators &&, ||, maybe.

choldgraf commented 9 years ago

This is sort of outside the scope of basic shell stuff, but I think it would be useful for many academic scientists if they got a brief primer on distributed computing platforms of some kind. I know there are a bunch out there, but they all tend to obey similar principles, no?

For example, our lab has a cluster with the Sun Grid Engine on it. However, nobody uses it because they don't know how. I think a small nudge would give people confidence to give it a shot. Just a thought!

froggleston commented 9 years ago

+1 for Lex's suggestions of screen, simple grep, awk and xargs examples. Rather than heavyweight scheduler operations, which are more in line with materials that would be produced by a local sysadmin, I too would prefer to see a bit of GNU parallel.

wking commented 9 years ago

On Tue, Mar 17, 2015 at 08:21:03AM -0700, Lex Nederbragt wrote:

top, free, screen(!))… find + xargs

To me these sound like good cantidates for a novice sysadmin lesson (which I think would be very useful).

I use screen myself, but that's mostly due to muscle-memory/familiarity. We might want to consider teaching tmux instead because it's a more modern, leaner codebase and we have an existing lesson on it (swcarpentry/bc#249).

gvwilson commented 9 years ago

I think that distributed/parallel/high-performance computing should be a lesson in its own right. I also think we might finally be ready to do it.

gvwilson commented 9 years ago

A separate sys admin lesson is an interesting idea - and I think it's a really useful way to decide what's not in this one.

wking commented 9 years ago

On Tue, Mar 17, 2015 at 09:56:36AM -0700, Greg Wilson wrote:

A separate sys admin lesson is an interesting idea - and I think it's a really useful way to decide what's not in this one.

Instead of grouping lessons into 6-hour-ish chunks, I'm also ok having stand-alone lesson repositories for each particular tool. Things like tmux and SSH are getting us away from the basic, tool-agnostic principles outlines in the best-practices papers, and folks are more likely to want à la carte choices to tailor to a specific audience. I heard a little bird mention (off-list :() that there's intrest in shorter lessons that could be delivered in the context of a symposium or meetup, and having little per-tool bites would fill that role nicely.

lexnederbragt commented 9 years ago

We do not (yet?) have intermediate shell lesson material. Do we need to consider having both 'shell-extras' and 'intermediate-shell', or should there just be one? And how to separate what goes where if we go for both?

froggleston commented 9 years ago

I think there would be a lot of interest in a parallelisation and HPC set of materials.

I like the idea of shell-extras as well as intermediate. If you attend a bootcamp and speed through the lessons, there would be a nice repo of extra exercises and materials for you.

gvwilson commented 9 years ago

Just one, I think.

ChristinaLK commented 9 years ago

(Sorry @gvwilson, you replied as I was typing this up!)

I would like to see shell-extras and shell-intermediate, simply because I feel like ssh, scp are "novice" level topics, whereas awk and sed are considerably more advanced. The question is where to divide. ssh and scp could also go in an hpc lesson, if appropriate. What do people think of:

Alternatively, we could divide entirely by theme: soup up your shell, sysadmin tricks, advance scripting, parallelization etc.

I think another important thing to consider is where/when would these be taught, and to what audience. I like the idea of shell-extras being little stand-alone chunks that someone could insert into a novice swc lesson as desired.

ChristinaLK commented 9 years ago

Let it be clear that my primary interest as far as maintaining/contributing goes would be shell-extras, as defined above.

froggleston commented 9 years ago

@ChristinaLK I like the look of those sections. However, would env variables be more suited to intermediate training rather than extras? They crop often in the other intermediate topics.

I'd be happy to contribute (and did in the old repo) to any of the shell materials :)

gdevenyi commented 9 years ago

I agree with @ChristinaLK regarding the breakdown of where shell should go at this point. There's a variety of addons that we sometimes talk about in live lessons that don't have pages yet (permissions come to mind) that would fit well with -extras.

wking commented 9 years ago

On Wed, Mar 18, 2015 at 04:54:53AM -0700, Christina Koch wrote:

ssh and scp could also go in an hpc lesson, if appropriate.

I'd rather avoid copy-pasting lesson bits between repositories, since that's hard to maintain cleanly. Can we have a ssh-novice lesson that builds on shell-novice? It seems like that material is pretty independent from the other entries you've listed for shell-extras, and could be maintained as a stand-alone lesson. I'd like to take the same approach for any other material that folks might want to mix and match. It makes more sense to me to create a lesson graph like:

  shell-novice
  |          |
ssh-novice shell-posix shell-parallel shell-utilities
                   shell-scripting

than to try and group those independent parts into a smaller number of lessons.

I think another important thing to consider is where/when would these be taught, and to what audience. I like the idea of shell-extras being little stand-alone chunks that someone could insert into a novice swc lesson as desired.

But then which chunks are prerequisites for shell-intermediate? Are you ok with folks hitting a scripting lesson without having seen job control? Environment variables? Permissions? If they're all optional for any downstream lesson, than collecting them together in one lesson repository is fine, but splitting them up as finely as possible seems the safest bet to me.

gdevenyi commented 9 years ago

I'm okay with advanced scripting being independent of job/environment/permissions

ChristinaLK commented 9 years ago

I would say that if you're the target audience for an intermediate-shell lesson, you a) already know the stuff in shell-extras or b) should have learned all the shell-extras content as a prereq (even if it's not directly needed in intermediate, per se). I am loath to split finely simply because that seems to create more administrative overhead AND require more searching for content.

wking commented 9 years ago

On Wed, Mar 18, 2015 at 07:06:25PM -0700, Christina Koch wrote:

I am loath to split finely simply because that seems to create more administrative overhead AND require more searching for content.

My gut says “if you can decompose $X into meaningful, distinct pieces, you should be putting them into separate repositories”, but I'm fine going with whatever folks find most appealing now, and then going through another round of repository splitting later on if it turns out to be warranted.

chwilk commented 9 years ago

One useful lesson/hint/suggestion I made for one of our collaborators recently was a brief explanation of Bash 'here documents'. He was able to go from a construct like:

echo "complex string with escaped variables, backslashes abounding" >> script.txt
echo "lots more echoes here with complicated variable escaping" >> script.txt
echo "even more pain if you want to use quotes inside this quoted string" >> script.txt
batch-submit script.txt
rm -f script.txt

to

batch-submit <<ENDOFSCRIPT
Actual un-escaped script commands
as much as you like
and it is readable
plus you can use quotes without escaping them
ENDOFSCRIPT