datasciencecampus / coding-in-the-open

A compendium of open-source guidance which aims to share the benefits, risks and a summarised strategy for open-source coding.
https://datasciencecampus.github.io/coding-in-the-open/
MIT License
8 stars 0 forks source link

consider 'open first...' strategy #50

Open NicciPotts opened 1 year ago

NicciPotts commented 1 year ago

Feedback from survey

This is a great piece of work although I come to a different conclusion. I’m much more in favour of open at the end rather than open from the start given the make up of our teams (new grads, want to encourage use of github) and risk appetite in the ONS. I would go for “open at some stage, closed by exception” strategy not a “open first…” strategy.

In my opinion, currently the work underplays the resource and time needed for code reviews throughout in the open at first model. The guidance about the things to consider in the open at the end world is exactly right. We have just released our nowcasting code publicly having developed it privately. The guidance suggests that there is lots of additional workload that would come from transitioning the codebase from private to public at a later date. I disagree – I think the coding in the open safely world means you effectively have to check every version of the history through the same process rather than just the final version? Why do we only need the process of sign off for open at end but not throughout? Why not external review for open at the end but not when coding in the open? If not surely your risk appetite is different. I think some repos may be able to code in the open – e.g. maybe on the capability side. But on the government project side I think when we are working on real data or policy issues in development I struggle to see how the benefits of coding in the open at the start outweigh the risks. On a side note I I don’t understand why you would ever want to transfer the history when you release the code at the end, it just increases the review burden.

I also have never seen a huge quantification of the benefits. How much in reality have people collaborated on our public repos? In what cases have they collaborated and when haven’t they – what is the prize we are going for? In some cases where people have wanted to collaborate we have added them to our private repo (e.g. Turing projects) which feels a reasonable way over the collaboration burden. Lots of the benefits in “why develop in the open section” seem to me to be largely the same if it is released at some stage, it doesn’t have to be released throughout. I think probably quality is higher if you are open throughout but this comes at the resource and time cost described above. For me the best arguments for code release at some stage are transparency and public investment but this can be achieved through releasing at the end.

Finally, I think there is another case where you keep repos privately forever. Not just sensitivity but again resource and time. We have huge amounts of repos that were set up for slightly random things we did for a week or two and then we moved on – they didn’t work etc. What is the point in then spending time reviewing them to become public or indeed starting them out as public.

I think it is all about cost, benefits for different projects and having this guidance is helpful but I would change your default, or at least for my team I think the best default is open at some stage. Very happy to discuss. Thanks so much for all your thinking on this.

A key takeaway from this feedback is that approach to open will differ across projects and squads - we have a name to this and can bring the G6 in for discussion?

Addresses similar feedback

There can be value in open code, however:

  1. Our code is often experimental
  2. There should be clear value to potential re-users, otherwise one is just adding to the noise
  3. Open code implies additional resources in development and in maintenance
MartinWoodONS commented 1 year ago

I think there are three points, once distilled, to act on:

  1. The many situations that favour open-at-the-end: this is guidance, not intended to be strict rules. There are sensible examples in this feedback of situations in which open-first might not be preferred, so don't! We don't really want to change the default to open-at-the-end because of the fundamental problem that repos that start closed often stay closed regardless - we want to deliberately apply a soft nudge there. But there is absolutely no problem with people making their own calls. In the end only the developers know all the specifics of their own work and projects

  2. The question of whether continuous PR on an open repo can replace an external review is valid. Would argue external review is a good idea for any code that's meant to be used for something, so it'd happen anyway at some point, it just wouldn't be essential for opening the code to public scrutiny. We should however check this; what does external review do/achieve that PR's don't? Both in theory (aims) and in practice (what was changed as a result on our open-at-the-end repo's?). This needn't be an onerous check, I just can't remember off the top of my head what changed and so I'll be checking commit messages.

  3. The idea that having everything open will add noise. I suggest our line should be that 1) we are thinking mostly in terms of projects, not every little self-guided experiment and two-day exploration, those should be the owner's call as to whether it's something they want public, and 2) with google being a thing, and GitHub's star system, adding lots of repos will have little effect on people's ability to find relevant bits.

I'm going to check those commits, then look at the current guidance pages and think on what alterations might be introduced to elegantly handle/make clear the above.