Closed richyvk closed 7 years ago
I'm aware it isn't a perfect example. My idea was to introduce three things here as a taster:
1) something powerful that looks hard but will be achievable by the end of the lesson 2) something that isn't a perfect solution but does a job, which is often as useful as perfection when writing code 3) something that prompts people to think how important it is to know your data. Here 'INTERNATIONAL' in allcaps almost exclusively appears on the journal title column, hence I'm using this as a quick hack to find all the journals whose titles contain the word 'International'. So, going back to 2), there is a better solution reusable for all datasets, but this ~works for this one.
All that said, I'd be delighted for you to make some changes Richard. I think the learning outcomes above come through better in how I present it than they do in the written down lesson!
Hi James
Good to put a face to the name last night. I Agree with all this. But, yes, I might have a look at it a bit, once ResBaz and various other things are done.
I do think it would be really good if we could record some example LC workshops, like they have on the SC site - that would really help people like me see how the existing members teach - rather than just working blind through the lesson pages.
Anyway, maybe I'll get a PR happening in the not too distant future!
@richyvk Agreed! Good to chat.
Which examples on the SWC site do you mean?
Think we can close this now. @richyvk: reopen if you aren't happy!
Hi all
Finally getting round to working through the LC courses for myself. Unless I'm totally dumb there si a problem with the first example:
$ grep 2009 2014-01_JA.tsv | grep INTERNATIONAL | awk -F'\t' '{print $5}' | sort | uniq -c
We state that this finds "the number of articles published in 2009 in academic journals whose title contains the word ‘International".
I know this is then qualified with "there are a few false positives" but I wonder if we should expand it a bit to get a better result. Maybe describing the dataset would help (variable names), then come up with an example that is free of those false positives while still demonstrating the power of the command. I'm happy to work on a replacement??