DS4PS / cpp-528-spr-2020

Course shell for CPP 528 Foundations of Data Science III for Spring 2020.
http://ds4ps.org/cpp-528-spr-2020/
2 stars 0 forks source link

Add working with GitHub Desktop vid in WK2 section #15

Closed cenuno closed 4 years ago

cenuno commented 4 years ago

Summary

Hour plus conversation with students on how to set up their GitHub repo with GitHub Desktop.

lecy commented 4 years ago

I'm curious in your own professional work, how important have you found it to use branches?

I know it is important when you have multiple team members working on the same code.

It both adds complexity and control. How do you assess the trade-offs for when it is helpful versus just more of a hassle?

cenuno commented 4 years ago

I have found them to be the best way to collaborate with others when, above all else, a documented history of what was revised is recognized as a necessity.

At the research institute, git was still seen as a nice to have but not a requirement for research analysts to use. That meant that folks were comfortable directly making changes onto master at all times (mostly because it was safe to assume one project had no more than one researcher on the project at all times). There was no need to perform pull requests because I was the only one looking at my code.

But at the the startup, the master branch was what customers saw at all times. That was literally our product that was consumer facing so we couldn't afford errors or typos on that branch. With a product that had daily updates, having features isolated in separate branches made it easy to see which revisions were being added and which ones stuck in review. It was very reminiscent of a toll-gate folks use to enter or exit a highway because it was necessary to document who made changes to what files when and why before allowing their code to be merged into master. The system was necessary though because so many people needed to read other people's code.

However, that environment was a start up. We all had to moonlight as software engineers to make changes to our product (which did take some time to get used to, esp. the fork workflow where everyone has their own copy of the repo) to separate the accidental merging of master on the company repo. That form of risk mitigation added extra overhead in the form of knowing where to send your work to (origin/master or fs-company-repo/master).

My time at this big company keeps it simple: they have it so that no one can every modify the master branch ever. All work has to be done on a separate branch and reviewed by someone else or else the code will never be merged into master ("put in production"). What they do, however, is that they modified the IDE to have point-and-click buttons that make creating branches easy and sending it up to the cloud. That ease is something I was surprised about and something I've come to appreciate about GitHub Desktop.

Sometimes not everyone has to know how use the command line to make things work. At the start up, you needed to learn bash and git from the command line otherwise your work never made it in front of the consumers. At this big company, they can't afford folks who are specialists to learn the command line so they built these UI tools to make git stuff as easy as possible while lowering the risk to zero. It does seem to a design choice made, either implicitly at the startup where everyone has to know enough git to be dangerous or explicitly at the big company where you don't need to know how it works under the hood because it just works.

lecy commented 4 years ago

What I am hearing here is, when a project is simple (few collaborators, not large) they might be overkill.

But when you need them you REALLY need them. So students should at least understand the process so they have the ability to choose?

cenuno commented 4 years ago

If you're a team of 1, then I think a feature branch workflow is overkill. If something goes wrong, you can force push to master and not worry about breaking other peoples code.

From my experience, anything other than a feature branch workflow with 2+ or more people is hard to do without inevitably breaking something. Without them, it's easy for students to fall into bad practices like being unaware of what code has been added/revised/deleted from master, not committing as frequently and unintentionally overwriting their/others work when they need to push/pull from master.

I like teaching that a branch is task specific and, because of that, has a short life span. They're sandboxes with a purpose and once that purpose is achieved it goes through a pull request for review and then deleted once merged into master.

Of course, it's possible to use git rebase whenever something goes wrong like storing a large data file in a commit or a merge conflict arises but these are situations that can be avoided all together by ensuring new work gets stored in a feature branch and is reviewed in a pull request before it makes its way to master.

The selling point to folks new to git is that feature branches protect you from breaking any existing code on master and they allow others to share what they have done in a formal way by using a pull request. You can "show" you boss what you've done much in the same way you can thoroughly review what someone you're supervising has done before deciding to accept, reject or discuss the proposed changes to master in that feature branch.