cms-sw / cms-bot

A few scripts to automate approval / testing process
28 stars 244 forks source link

Change default action to squash merge in cmssw repo. #2080

Closed antoniovilela closed 7 months ago

antoniovilela commented 11 months ago

As discussed in the ORP meeting (2023-SEP-19), managers and users agree to squash commits when merging to master. This issue is to follow-up on what is needed in cms-bot.

cmsbuild commented 11 months ago

A new Issue was created by @antoniovilela .

@Dr15Jones, @rappoccio, @smuzaffar, @makortel, @sextonkennedy, @antoniovilela can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

makortel commented 11 months ago

I don't think an agreement was reached (the topic was merely touched last week), and personally in many cases I want to preserve the commit history. (although I agree in many cases the commits can be squashed).

antoniovilela commented 11 months ago

I don't think an agreement was reached (the topic was merely touched last week), and personally in many cases I want to preserve the commit history. (although I agree in many cases the commits can be squashed).

Let's maybe discuss first what is needed in order for us to use the "Squash and merge" option? @smuzaffar

Dr15Jones commented 11 months ago

I strongly disagree with this. I personally am very careful about creating commits with meaningful comments to help understand why code was changed. Squashing them all would make using git history far less useful.

smuzaffar commented 11 months ago

There are three ways one can merge the PR

  1. Merge pull request: This is default for all our repositories. It keeps the histroy of the commits and on top it adds a merge commit. This merge commit is used by bot to find out which pull requests has been merged on top of last build. This informtion is then used for generating the IB pages. This merge commit also very help full for tagging the IBs. e.g. one can find the last merge commit before 11h00 to tag the 11h IB. Without this merge commit there could be chance that we tag something incomplete e.g. lets say a PR has two commits one at 10h59 and other at 11h01 and we have merge it at 11h05 (just before starting the IB) then bot will build the IB using 10h59 commit.
  2. Squash and merge: This does not create any merge commit and is squashes all the commits on to one re-write the commit message. Many developers does not like this as it loses the commit history.
  3. Rebase and merge: This also does not create any merge commit. It keeps the commit history of PR , so if a PR has N commits then all of those will be rebased on top of the base branch. Bebasing also means change in commit sha which some developers might not like if they want to keep on working on the local branch.

I prefer to keep using the option 1. Option 2 loses all the commit history and option 3 (though keeps the commits) but it can build inconsistant IBs.

makortel commented 11 months ago

I think we'd need to understand better the motivation(s) behind the proposal. Maybe there could be a different path forward to improve things?

antoniovilela commented 11 months ago

Do I understand correctly that if one does a Squash and merge the PR will not show up in the IB page. Is there a way around this? Since it is one commit you do not get the problem of incomplete PRs. In many (most?) cases the commit history is not very meaningful and what matters is the PR in its entirety it would seem? Could we simply have the option of the Squash and merge, without unintended consequences in the IB monitoring?

smuzaffar commented 11 months ago

@antoniovilela, I agree with @makortel. Lets first understand the motivation(s) behind the proposal

smuzaffar commented 11 months ago

also note that with Squash and merge, we not only lose the history of commit but also the lose the history who to blame for the code. I did a give test PR where multiple developers provided changes but the blame only shows the user who actually open the PR. Also the actual committer loses the credit of the commit too.

makortel commented 11 months ago

Here are some further downsides on "Squash and merge" model

Written that, I share the feeling that many PRs could probably be squashed into one commit without any of the downsides mentioned so far. I believe it would be possible to find a solution that balances the different aspects (and for that we need better understanding first).

makortel commented 11 months ago

assign core, orp

Let's include @cms-sw/all-l2 for more feedback.

cmsbuild commented 11 months ago

New categories assigned: core,orp

@Dr15Jones,@makortel,@rappoccio,@antoniovilela,@smuzaffar,@sextonkennedy you have been requested to review this Pull request/Issue and eventually sign? Thanks

tvami commented 11 months ago

include all-l2 for more feedback

I think the ability to do squashing w/o having to ask the developers is a welcomed addition. My main argument for asking for squashing from the developers was that when one looks at the diff between releases on github (e.g. https://github.com/cms-sw/cmssw/compare/CMSSW_10_6_28...CMSSW_10_6_30) that shows the commit differences, and not the PR differences. So having one commit per PR will make that much more cleaner.

Of course well constructed commits are good for this too, but many many times we have "fix", "fix typo", "code checks", "answer to reviewers", "code checks again" as commit msgs, that's when the squashing from the reviewers/orp would be nice IMO.

fwyzard commented 11 months ago

Also the actual committer loses the credit of the commit too.

Doesn't the squashed commit list all the Co-authored-by people ?

fwyzard commented 11 months ago

Having worked with non-CMS repositories that enforce a linear history (like llvm) I noticed that they are a lot easier to browse and follow.

Compare

cmssw$ git log --oneline --graph --no-decorate -n20
*   56711f4be87 Merge pull request #42838 from guitargeek/buildfile_duplicate_entries_1
|\  
| * 7176cc0fce3 Remove duplicate entries from `CondTools/RunInfo/plugins/BuildFile.xml`
* |   213663839a3 Merge pull request #42872 from CMSTrackerDPG/FixFEDErrorGPUvsCPUImbalance
|\ \  
| * | 708c739cffc store errors only for valid DetIds
| * | 951cf6cc81c skip the 2nd error word for the timeout error as in the CPU code
* | |   20c8c100813 Merge pull request #42848 from alejands/noisyGpuValidation_133X
|\ \ \  
| * | | 6ec2b12513f Prevent using getByToken on EcalRawData when skipped and not consumed
| * | | 4d18fd5efba Introduce ability to choose to skip running over collections
| * | | e1408a755c3 Toggle off running over collections missing in EcalOnly configs
| * | | d8d8eb0934f Correct GpuTask flag logic to avoid running over unused collections
* | | |   f3dfef0254f Merge pull request #42728 from nurfikri89/from133X_202309042300_nanoDev_jmeNanoV11
|\ \ \ \  
| * | | | f3b31fa1ee5 Remove unused HF energy fractions from modifiers
| * | | | 3615d26bcf3 Use int16 data type for multiplicity variables
| * | | | 325107b52ed Add energy fractions for AK8 jets. Change data type and dummy value for multiplicity variables.
| * | | | 9f03ad5fd17 Check isPFJet() for multiplicity variables because in MiniAOD, AK8 jets below some pt thresholds have reduced information stored (e.g multiplicities)
| * | | | 9fce07625f5 Make it optional to use other PuppiProducer label used for puppi jets clustering. 'puppi' is set as the default value.
| * | | | 15db740e555 Fixes to variable names and PuppiProducer label
| * | | | f2cbbbe9a0c Fix typo
| * | | | 772b5375506 Rename multiplicity variables in nanojmeDQM_cff.py
| * | | | da915559f8d Make it optional to use a different PuppiProducer label rather than a hard-coded label. 'puppiNoLep' is set as the default value.

with

llvm-project$ git log --oneline --graph --no-decorate -n20
* 97187e127881 [AArch64] update "rm" inline asm test (#67472)
* 4bdec5830bc3 [flang][runtime] Enable more code for offload device builds. (#67489)
* 21c2ba4bdb82 [GlobalISel] Remove TargetLowering::isConstantUnsignedBitfieldExtractLegal
* 9eeb0293e27c [SLP]Cleanup MultiNodeScalars when tree deleted.
* cd7f1714dea1 [BOLT][RISCV] Implement R_RISCV_64 (#67558)
* 2e9a04b985e8 [X86]Add NO_REVERSE attribute to X86 RMW instrs in memfold table (#67288)
* 76a5602fd027 [scudo] Always express sizes in terms of element count in BufferPool (#66896)
* 6e46545b9814 Fix warning on align directives with non-zero fill value (#67237)
* eff4ef25b3dc Revert "[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432)"
* ccd5b8db48a4 [X86] Change target of __builtin_ia32_cmp[p|s][s|d] from avx into sse/sse2 (#67410)
* 07d08f4eb44e [libc++] Don't add reference to system_category when exceptions disabled (#67504)
* 5109cb28fda8 [mlir][bufferization] Make buffer deallocation pipeline op type independent (#67546)
* 6e1dcc933511 [libc++] Refactor string unit tests to ease addition of new allocators
* 47b7f33b13b6 [IR] Allow llvm.ptrmask of vectors (#67434)
* cf7eac9650f3 [ObjectSizeOffsetVisitor] Bail after visiting 100 instructions (#67479)
* f3c417f34113 [Passes] Add option for LoopVersioningLICM pass. (#67107)
* 918863deb0af [clang][Diagnostics] Make 'note' color CYAN (#66997)
* 30fe8762446c [mlir][cfg-to-scf] Fix invalid transformation when value is used in a subregion (#67544)
* d338acf36edd [Flang][OpenMP] NFC: Version of a few tests with HLFIR
* 7c128f6d0e35 CostModel/RISCV: tweak cost of vector ctpop under ZVBB (#67020)
fwyzard commented 11 months ago

Answering myself: yes, the commit message for a squashed PR shows the additional authors as Co-authored-by:

image

antoniovilela commented 11 months ago

assign core, orp

Let's include @cms-sw/all-l2 for more feedback.

Many thanks, it is very good to hear from @cms-sw/all-l2

smuzaffar commented 11 months ago

Also the actual committer loses the credit of the commit too.

Doesn't the squashed commit list all the Co-authored-by people ?

yes the squashed commit message contains the co-author but does not contain the information about who change which part of the code e.g. https://github.com/cms-sw/cms-bot/commit/7819e3c430e2e3ff07478b374d01f5974d0d043b is a squashed commit and the actual changed code does tell who change what. So you can not properly blame the right author :-)

antoniovilela commented 11 months ago

Here are some further downsides on "Squash and merge" model

  • it doesn't allow sharing a branch for PRs in different release cycles (of course the branch must be based on a commit/tag that is common between the release cycles)
  • it doesn't allow (out of the box) basing a "further developments branch" on top of a branch used for a PR
  • for the PRs that have a meaningful commit history, it would make the code archaeology more difficult

Written that, I share the feeling that many PRs could probably be squashed into one commit without any of the downsides mentioned so far. I believe it would be possible to find a solution that balances the different aspects (and for that we need better understanding first).

I still feel that in most cases, squashing into a single commit would be the better option.

Could we add a "squash flag", different L2s could add it to flag the PR to be squashed, without needing to ask authors and delaying the approval, and release managers would then use Squash and merge for these PRs.

On a different note, do we already have regular training sessions on project collaboration with Git? This might be a more important outcome of this discussion.

perrotta commented 11 months ago

As past release manager, I would have really liked the possibility to have a "squash" button, that just squashes the commits, without merging them. I would have used it in those (few) cases where the authors ended up with tens of useless commits, and were not able, or did not want, to squash them themselves. I would say that the release manager should use it only in particular cases, and with the agreement of the authors and the reviewers of the PR. Then the squashed commit(s) could be merged as usual in the next step, without renouncing to the merge commit.

fwyzard commented 11 months ago

Then the squashed commit(s) could be merged as usual in the next step, without renouncing to the merge commit.

This is not a workflow that GitHub supports (squash, then merge).

Also, what is the advantage of keeping the merge commits ? One of the advantages of always squashing instead of merging is to get rid of the merge commits altogether...

jhakala commented 11 months ago

Hi all,

Just to give a small anecdote and a mixed opinion on this proposal. I recently implemented a runTheMatrix workflow which got squashed into someone else’s commit. That other person did the main development but needed help on runTheMatrix because they didn’t have experience with the runTheMatrix code. Now, this other person is the one who receives messages on how to use this workflow (which incidentally has a fairly large “consumer base”) and needs to forward questions to me about using it. It may help to avoid such confusions if the blame accurately reflects who introduced which feature.

Would it be possible to squash commits on the basis of committer, so as to keep the blame consistent, while reducing the number of individual commits on a single-user basis?

However, I do feel that it is important to be able to squash things because having to look at 10 different commits to track down what happened in particularly troublesome PRs can be really time-consuming and annoying.

perrotta commented 11 months ago

Also, what is the advantage of keeping the merge commits ?

To have them listed in the IBs, as well as in the release notes

fwyzard commented 11 months ago

If all PRs were squashed instead of merged, the IBs and release notes could simply list all commits.

And we could make the bot use a standardised format for the squashed commit message, like

Update GPU workflows (#42713)

A longer description of the commit would go here - maybe the text of the PR description when it was merged.
Martin-Grunewald commented 11 months ago

Maybe post a recipe somewhere how to squash with leaving just a (new) single commit message...

tvami commented 11 months ago

Maybe post a recipe somewhere how to squash with leaving just a (new) single commit message...

This exists, people just usually dont do it (and they have to be pushed which then makes the review process longer)

Martin-Grunewald commented 11 months ago

Well yes I can google it and lots of different recipes come up, interactive work needed, editing a file, blah blah.... lots of work.

We have cms specific git commands like git cms-addpkg, so I propose one makes a cms-squash command which assumes all commites in a repo (from which the PR is generated) to be quashed. This is listed together with the bot commands when a PR is made, so that proponents of PRs can choose to apply this easy recipe (similar to code formatting) and know what to do and do not need to search for it. (IOW, make it easy for non git experts).

fwyzard commented 11 months ago

We have cms specific git commands like git cms-addpkg, so I propose one makes a cms-squash command which assumes all commites in a repo (from which the PR is generated) to be quashed. This is listed together with the bot commands when a PR is made, so that proponents of PRs can choose to apply this easy recipe (similar to code formatting) and know what to do and do not need to search for it. (IOW, make it easy for non git experts).

Yes, that's a reasonable request.

I think the most complicated part in writing such tool is how to figure out what is the "starting point" to use for the squash.

A possibility could be to use the commit corresponding to the release itself, i.e. if one is working in CMSSW_13_3_0_pre3 or CMSSW_13_3_X_2023-09-27-2300, the script could produce a single commit including all the changes on top of those tags.

antoniovilela commented 11 months ago

@makortel Matti, would you like to summarize the proposals? Many thanks.

makortel commented 11 months ago

Matti, would you like to summarize the proposals? Many thanks.

I'd suggest we discuss this topic in the core software meeting next week (as a first topic, so should have minimal impact to ORP attendees that want to participate but not follow full core software meeting).

makortel commented 11 months ago

Outcome of the discussion in the core software meeting

kpedro88 commented 11 months ago

see https://github.com/cms-sw/cms-git-tools/pull/125 for cms-squash-topic

jhakala commented 11 months ago

Hi all,

Sorry to be a wet blanket on this very important development (and I totally support making improvements to the current situation) but---

I am still worried about the fact that contributors to CMSSW will get their contributions “erased” from history due to indiscriminate squashes.

There are many contributors to CMSSW already whose contributions are lost to the winds of time due to their contributions being incorporated “silently”. And these folks who contributed are really hard to dig out and to send emails to even though their input might be forthcoming but nobody can find them because their contributions are erased from history.

I really think it is important not to “squash” away contributors to CMSSW. But that is what you all have seemed to have agreed on. Please look at my earlier post here to reconsider this decision.

Best, John


From: Matti Kortelainen @.> Sent: Tuesday, October 10, 2023 5:59:23 PM To: cms-sw/cms-bot @.> Cc: Hakala, John Charles (jh7uq) @.>; Team mention @.> Subject: Re: [cms-sw/cms-bot] Change default action to squash merge in cmssw repo. (Issue #2080)

Outcome of the discussion in the core software meeting

— Reply to this email directly, view it on GitHubhttps://github.com/cms-sw/cms-bot/issues/2080#issuecomment-1755748034, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB6MZFSR3TAH6Y4NWXD2OHLX6VWFXAVCNFSM6AAAAAA5H345E6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJVG42DQMBTGQ. You are receiving this because you are on a team that was mentioned.Message ID: @.***>

tvami commented 11 months ago

But @jhakala if the user does the above introduced cms-squash-topic, it will be their contribution all along. At least that's my understanding

makortel commented 11 months ago

@kpedro88 What does the cms-squash-topic do if the branch contains commits from different developers? Does it include the others-than-squasher with Co-authored-by:?

makortel commented 11 months ago

As a side comment I'd find the model of directly approaching only the committer of a particular change and not including the assigned maintainers (that would be PdmV in case of runTheMatrix) in the communication suboptimal in general (there are certainly exceptions).

kpedro88 commented 11 months ago

@makortel to test this, I picked a random recent PR branch and added my own trivial commit to it.

  1. The original branch (mirrored to my fork): https://github.com/kpedro88/cmssw/commits/Run3-sim149Y
  2. The branch with my extra commit: https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-plus
  3. After running cms-squash-topic: https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash
  4. Instead using git rebase -i to squash: https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-rebase-squash

In 3. the commit author is me, with the original commit authors listed in the combined commit message. In 4. the author of the first commit is preserved, with me as the committer. This is slightly better, but doesn't scale beyond 2 people (author and committer fields can be different, but each one can only have one value).

GitHub and GitLab support a co-author standard described here: https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors.

With a modified version of cms-squash-topic, additional authors can be included:

  1. https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash2

However, if cms-squash-topic is called a second time (or more generally, called on a branch that includes commits that already have coauthors), the coauthor info is lost:

  1. https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash2again

This can be addressed with a further modification (at the cost of a bit of redundancy):

  1. https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash2again2

https://github.com/cms-sw/cms-git-tools/pull/125 is updated with the changes.

kpedro88 commented 11 months ago

@jhakala I hope https://github.com/cms-sw/cms-bot/issues/2080#issuecomment-1761329070 addresses your concern more directly.

The issues you raised were actually the subject of a long discussion in the core software meeting on Tuesday, and generally there was agreement that we want to attribute changes properly. That led to the plan in https://github.com/cms-sw/cms-bot/issues/2080#issuecomment-1755748034 to provide a tool whose use can be requested at the discretion of the release manager (or for other purposes for which it might be convenient).

fwyzard commented 11 months ago

Also, it would be really helpful if people opened pull requests with a useful commit history and meaningful messages.

.A

jhakala commented 11 months ago

Thanks Kevin,

This does address all my concerns and I appreciate your work on them.

Best, John


From: Kevin Pedro @.> Sent: Friday, October 13, 2023 1:06:41 PM To: cms-sw/cms-bot @.> Cc: Hakala, John Charles (jh7uq) @.>; Mention @.> Subject: Re: [cms-sw/cms-bot] Change default action to squash merge in cmssw repo. (Issue #2080)

@jhakalahttps://github.com/jhakala I hope #2080 (comment)https://github.com/cms-sw/cms-bot/issues/2080#issuecomment-1761329070 addresses your concern more directly.

The issues you raised were actually the subject of a long discussion in the core software meeting on Tuesday, and generally there was agreement that we want to attribute changes properly. That led to the plan in #2080 (comment)https://github.com/cms-sw/cms-bot/issues/2080#issuecomment-1755748034 to provide a tool whose use can be requested at the discretion of the release manager (or for other purposes for which it might be convenient).

— Reply to this email directly, view it on GitHubhttps://github.com/cms-sw/cms-bot/issues/2080#issuecomment-1761331999, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB6MZFWE3TM4CKFXRRGFKQDX7EOEDAVCNFSM6AAAAAA5H345E6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRRGMZTCOJZHE. You are receiving this because you were mentioned.Message ID: @.***>

antoniovilela commented 11 months ago

@makortel to test this, I picked a random recent PR branch and added my own trivial commit to it.

  1. The original branch (mirrored to my fork): https://github.com/kpedro88/cmssw/commits/Run3-sim149Y
  2. The branch with my extra commit: https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-plus
  3. After running cms-squash-topic: https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash
  4. Instead using git rebase -i to squash: https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-rebase-squash

In 3. the commit author is me, with the original commit authors listed in the combined commit message. In 4. the author of the first commit is preserved, with me as the committer. This is slightly better, but doesn't scale beyond 2 people (author and committer fields can be different, but each one can only have one value).

GitHub and GitLab support a co-author standard described here: https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors.

With a modified version of cms-squash-topic, additional authors can be included: 5. https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash2

However, if cms-squash-topic is called a second time (or more generally, called on a branch that includes commits that already have coauthors), the coauthor info is lost: 6. https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash2again

This can be addressed with a further modification (at the cost of a bit of redundancy): 7. https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash2again2

cms-sw/cms-git-tools#125 is updated with the changes.

Hi Kevin, all, belated thanks!

Let's continue to follow up on the list in https://github.com/cms-sw/cms-bot/issues/2080#issuecomment-1755748034, namely pointing to the squash command in cms-bot messages; check if there is no diff in the PR and if so not reset the tests & signatures; more generally, not reset signatures related to a given package that was not updated in a new commit in the PR.

Thanks again.

smuzaffar commented 7 months ago

FYI, bot PR #2144 should allow to squash a cmssw PR without resetting the L2's signatures. As squashed creates a new commit so the PR checks will be reset but bot will re-run the code-checks and PR tests automatically.

antoniovilela commented 7 months ago

FYI, bot PR #2144 should allow to squash a cmssw PR without resetting the L2's signatures. As squashed creates a new commit so the PR checks will be reset but bot will re-run the code-checks and PR tests automatically.

Fantastic!

smuzaffar commented 7 months ago

@antoniovilela , can we close this issue now?

antoniovilela commented 7 months ago

@antoniovilela , can we close this issue now?

Yes, I think so.