Closed jennybc closed 7 years ago
This is shaping up splendidly. It's going to be extremely useful (and I can't wait to start to share it with my students and colleagues.
Here are some comments on the current draft:
1) I'd suggest separating the "Why and "who" and bolstering the "Why". For the "why" a painful anecdote or two might be helpful (sent wrong version of file using naming like "reallyfinalfixes04-05-07"; someone clobbered your shared Dropbox folders). I'd suggest that you consider moving the "Git has been repurposed" through the end of the "In my opinion, for new users" paragraph to this section. I also wonder if the "Many people who don't use Git" paragraph belongs earlier.
* See #4
2) Consider changing "Is this going to hurt?" from "Yes" to "Yes, at least somewhat".
* I'm sticking with the blunt "Yes" because I think the shock value of this admission is probably constructive.
3) Move discussion of issues up to intro of Github section.
* See #5
4) Adding the example of a commit, diff, and tag would be extremely helpful.
* Yes, once I consider the prose "done-ish", I'm going to work on a couple of examples / diagrams. I'm linking to inspirational diagrams where I can now, to give a sense of what I want.
5) Consider relegating BigBucket and Gitlab to a section on "more advanced topics", appendix, or paragraph in the conclusion.
* Agreed, I've moved all but the first passing mention to a section on "extras"
6) Consider adding an example where you've added a pdf formatted output along with a Rmd (but this can generate a merge conflict if someone else re-renders it and commits the changes).
* See #6
7) Mentioning the "github_document !!! keep_md = TRUE !!! is a really good idea.
* I cover this now and have a note to show the yaml in the yet-to-be made example/diagram.
8) More discussion of the features (and serious limitations IMHO) of the RStudio IDE in terms of git. Specifically, mention of the need for students and instructors to work sometimes in the shell and sometimes using the IDE). This could be exiled.
* I don't think I'm going to have room for this here. I do already point out that RStudio has major gaps and that one can use a mix of RStudio, another Git client, and the shell.
9) Move pull requests (and branching) to the advanced section.
* Agreed, moved into "more resources".
10) Mention some lighter weight github options for teaching
* Haven't explicitly done this but now have links to GitHub Education and GitHub Classroom, which would lead people to more resources, their blog, etc.
11) At a time when there's increased scrutiny on transparency and reproducibility, we are compelled to up our game (and for instructors ensure that the next generation of students have the appropriate skills and tools to be able to wrangle, analyze, and communicate in groups).
* Something like this will go in the yet-to-be written conclusion.
I've also pasted in some of the background suggestions I made many moons ago in the issue you reference above:
The following thoughts and/or URL's may be useful for background for your github paper (kudos ago for the fabulous http://happygitwithr.com/ book)
tools to support the teaching of statistics as a "team sport": we get laughed at if we don't sport better workflows
http://www.businessinsider.com/github-ceo-chris-wanstrath-interview-2015-10
Coursera Data Science specialization example of how github is used at scale!
aspects of rawgit.com and why it is useful
http://stats.stackexchange.com/questions/2910/how-to-efficiently-manage-a-statistical-analysis-project as part of a broader set of questions
Someone sent me: http://xkcd.com/1597/
I laughed: there's more than a grain of truth in this.
But there's a bigger issue here. A few years back I was stunned when there was a talk by one of the RStudio developers at one of the larger federal statistical agencies. None of the audience members were using a source code/version control system. I suspect that not much has changed in the interim.
When do we teach this to our students? How can we teach this to our students unless we eat the same dog food?
Thanks @nicholasjhorton, extremely helpful!
Article is looking great and I'm excited to see this kind of stuff getting out to a wider audience, some general and specific comments I came up with as I read the draft are below.
The Initial setup section doesn't mention the need for setting the handful of critical git --config options. This is always a pain point with students and seems worth mentioning.
The project is still a regular directory on your computer, that you can locate, name, and interact with as you wish. - Maybe mention that folders can be copied, moved or even shared and still work. This regularly blows my students' minds.
which takes a multi-file snapshot of the entire project. - I'm having trouble coming up with better phrasing but maybe make it clear that changes can be to one or more files? Multi-file reads to me as being more than one file.
Commits, diffs, and tags - some kind of concrete example for version A and B would be useful I think. Something as simple as just adding a new line of text would be sufficient. Is there some canonical text/song/etc. that people use for this?
(it is not, in fact, random but is a checksum hash and is technically a SHA-1) (it is not, in fact, random but is a SHA-1 checksum hash of the commit)
In my opinion git with terrible (assdfasd) type commit messages is still better than no version control.
Mention that github tends to mangle .Rmd files (emphasizes the importance of the .md intermediate)?
.md
. I'm not quite sure what you mean by "mangles". I just think the .Rmd
isn't the whole story, since the code is sitting there unexecuted.Mention git tracking of binary files (maybe wrt diffs etc)?
I view git / github as being the best of the two approaches you described - the decentralized bit is like the email approach where you have the freedom to work locally however you want but the github repo introduces a central "true" version that is shared between everyone like the google doc paradigm.
Not sure where to put this but a simple admonition for users to read the error messages - git is very verbose but most of the time it has a decent suggestion of what you should try if something didnt work.
I don't do enough with them wrt to teaching but I think pull requests are a killer feature for collaboration, particularly for scaling up to larger projects.
R specific stuff seems to be well covered elsewhere, couple of remaining topics could be easily integrated into the existing content.
Maybe some more mention of the CLI, they're going to have to see it at somepoint and I've actually found demoing things in parallel between a GUI and the CLI has been useful for students to understand what is going on with git. There are really only 7 or 8 commands / verbs they will need to know to get started.
I consider this feedback as handled (possibly by another issue) and am acting on Hadley's review now.
@hadley @nicholasjhorton @rundel
Just FYI, there is an actual document here in the README now. I still have plenty to do and am pushing on.
But if anyone (AE or prospective reviewer) is willing to take a quick pass through, I can incorporate that feedback as I go. Very high-level feedback is fine (this section can go! this makes no sense! this is most valuable part!).
There obviously isn't going to be much time for iteration here!
I still have some writing to do, but my biggest dilemma is whether to make some specific examples, diagrams, etc.
Also wondering how to handle what could be a very large number of semi-perishable links. For example, Happy Git has an unclear future, but that's where I need to point for a lot of the promised concrete details on setup and early workflows. I do own the domain but it's not like it's an official book or anything. 🤔