dsscollection / git-github-for-stats

Git and GitHub evangelism for the practicing statistician
https://peerj.com/preprints/3159/
2 stars 0 forks source link

Eyeballs welcome #1

Closed jennybc closed 7 years ago

jennybc commented 7 years ago

@hadley @nicholasjhorton @rundel

Just FYI, there is an actual document here in the README now. I still have plenty to do and am pushing on.

But if anyone (AE or prospective reviewer) is willing to take a quick pass through, I can incorporate that feedback as I go. Very high-level feedback is fine (this section can go! this makes no sense! this is most valuable part!).

There obviously isn't going to be much time for iteration here!

I still have some writing to do, but my biggest dilemma is whether to make some specific examples, diagrams, etc.

Also wondering how to handle what could be a very large number of semi-perishable links. For example, Happy Git has an unclear future, but that's where I need to point for a lot of the promised concrete details on setup and early workflows. I do own the domain but it's not like it's an official book or anything. 🤔

nicholasjhorton commented 7 years ago

This is shaping up splendidly. It's going to be extremely useful (and I can't wait to start to share it with my students and colleagues.

Here are some comments on the current draft:

1) I'd suggest separating the "Why and "who" and bolstering the "Why". For the "why" a painful anecdote or two might be helpful (sent wrong version of file using naming like "reallyfinalfixes04-05-07"; someone clobbered your shared Dropbox folders). I'd suggest that you consider moving the "Git has been repurposed" through the end of the "In my opinion, for new users" paragraph to this section. I also wonder if the "Many people who don't use Git" paragraph belongs earlier.

* See #4

2) Consider changing "Is this going to hurt?" from "Yes" to "Yes, at least somewhat".

* I'm sticking with the blunt "Yes" because I think the shock value of this admission is probably constructive.

3) Move discussion of issues up to intro of Github section.

* See #5

4) Adding the example of a commit, diff, and tag would be extremely helpful.

* Yes, once I consider the prose "done-ish", I'm going to work on a couple of examples / diagrams. I'm linking to inspirational diagrams where I can now, to give a sense of what I want.

5) Consider relegating BigBucket and Gitlab to a section on "more advanced topics", appendix, or paragraph in the conclusion.

* Agreed, I've moved all but the first passing mention to a section on "extras"

6) Consider adding an example where you've added a pdf formatted output along with a Rmd (but this can generate a merge conflict if someone else re-renders it and commits the changes).

* See #6

7) Mentioning the "github_document !!! keep_md = TRUE !!! is a really good idea.

* I cover this now and have a note to show the yaml in the yet-to-be made example/diagram.

8) More discussion of the features (and serious limitations IMHO) of the RStudio IDE in terms of git. Specifically, mention of the need for students and instructors to work sometimes in the shell and sometimes using the IDE). This could be exiled.

* I don't think I'm going to have room for this here. I do already point out that RStudio has major gaps and that one can use a mix of RStudio, another Git client, and the shell.

9) Move pull requests (and branching) to the advanced section.

* Agreed, moved into "more resources".

10) Mention some lighter weight github options for teaching

* Haven't explicitly done this but now have links to GitHub Education and GitHub Classroom, which would lead people to more resources, their blog, etc.

11) At a time when there's increased scrutiny on transparency and reproducibility, we are compelled to up our game (and for instructors ensure that the next generation of students have the appropriate skills and tools to be able to wrangle, analyze, and communicate in groups).

* Something like this will go in the yet-to-be written conclusion.

I've also pasted in some of the background suggestions I made many moons ago in the issue you reference above:

The following thoughts and/or URL's may be useful for background for your github paper (kudos ago for the fabulous http://happygitwithr.com/ book)

tools to support the teaching of statistics as a "team sport": we get laughed at if we don't sport better workflows

http://www.businessinsider.com/github-ceo-chris-wanstrath-interview-2015-10

Coursera Data Science specialization example of how github is used at scale!

aspects of rawgit.com and why it is useful

http://stats.stackexchange.com/questions/2910/how-to-efficiently-manage-a-statistical-analysis-project as part of a broader set of questions

http://xkcd.com/1296/

Someone sent me: http://xkcd.com/1597/

I laughed: there's more than a grain of truth in this.

But there's a bigger issue here. A few years back I was stunned when there was a talk by one of the RStudio developers at one of the larger federal statistical agencies. None of the audience members were using a source code/version control system. I suspect that not much has changed in the interim.

When do we teach this to our students? How can we teach this to our students unless we eat the same dog food?

jennybc commented 7 years ago

Thanks @nicholasjhorton, extremely helpful!

rundel commented 7 years ago

Article is looking great and I'm excited to see this kind of stuff getting out to a wider audience, some general and specific comments I came up with as I read the draft are below.

  1. The Initial setup section doesn't mention the need for setting the handful of critical git --config options. This is always a pain point with students and seems worth mentioning.

    • Done; see #2
  2. The project is still a regular directory on your computer, that you can locate, name, and interact with as you wish. - Maybe mention that folders can be copied, moved or even shared and still work. This regularly blows my students' minds.

    • I added a bit to that sentence and specifically say "You don't have to handle it with special gloves!"
  3. which takes a multi-file snapshot of the entire project. - I'm having trouble coming up with better phrasing but maybe make it clear that changes can be to one or more files? Multi-file reads to me as being more than one file.

    • Reworded for less awkward.
  4. Commits, diffs, and tags - some kind of concrete example for version A and B would be useful I think. Something as simple as just adding a new line of text would be sufficient. Is there some canonical text/song/etc. that people use for this?

    • As soon as the text is "done-ish", I plan to make a couple examples/diagrams and this will be part of what's shown. I've put some indicative inspiration in for the moment.
  5. (it is not, in fact, random but is a checksum hash and is technically a SHA-1) (it is not, in fact, random but is a SHA-1 checksum hash of the commit)

    • Thanks, used your wording.
  6. In my opinion git with terrible (assdfasd) type commit messages is still better than no version control.

    • Agree. But don't see the need to put an explicit statement. People don't seem to need any encouragement to write crappy commit messages. They'll figure that out on their own. 😉
  7. Mention that github tends to mangle .Rmd files (emphasizes the importance of the .md intermediate)?

    • I am certainly harping on the importance of .md. I'm not quite sure what you mean by "mangles". I just think the .Rmd isn't the whole story, since the code is sitting there unexecuted.
  8. Mention git tracking of binary files (maybe wrt diffs etc)?

    • This is discussed in "Which files to commit" and I'm working through a specific scenario with @nicholasjhorton in #6.
  9. I view git / github as being the best of the two approaches you described - the decentralized bit is like the email approach where you have the freedom to work locally however you want but the github repo introduces a central "true" version that is shared between everyone like the google doc paradigm.

    • You're right! I've reworded to make this point.
  10. Not sure where to put this but a simple admonition for users to read the error messages - git is very verbose but most of the time it has a decent suggestion of what you should try if something didnt work.

    • So far, I haven't found the right place for this.
  11. I don't do enough with them wrt to teaching but I think pull requests are a killer feature for collaboration, particularly for scaling up to larger projects.

    • We're going to have to settle for a passing mention of pull requests in "more resources".
  12. R specific stuff seems to be well covered elsewhere, couple of remaining topics could be easily integrated into the existing content.

    • Yes, I was able to get rid of this section. 🎉
  13. Maybe some more mention of the CLI, they're going to have to see it at somepoint and I've actually found demoing things in parallel between a GUI and the CLI has been useful for students to understand what is going on with git. There are really only 7 or 8 commands / verbs they will need to know to get started.

    • Haven't been able to fit this in. But there will be many links to Happy Git, which does show a lot of command line Git in shell chunks.
jennybc commented 7 years ago

I consider this feedback as handled (possibly by another issue) and am acting on Hadley's review now.