Closed antoniovilela closed 7 months ago
A new Issue was created by @antoniovilela .
@Dr15Jones, @rappoccio, @smuzaffar, @makortel, @sextonkennedy, @antoniovilela can you please review it and eventually sign/assign? Thanks.
cms-bot commands are listed here
I don't think an agreement was reached (the topic was merely touched last week), and personally in many cases I want to preserve the commit history. (although I agree in many cases the commits can be squashed).
I don't think an agreement was reached (the topic was merely touched last week), and personally in many cases I want to preserve the commit history. (although I agree in many cases the commits can be squashed).
Let's maybe discuss first what is needed in order for us to use the "Squash and merge" option? @smuzaffar
I strongly disagree with this. I personally am very careful about creating commits with meaningful comments to help understand why code was changed. Squashing them all would make using git history far less useful.
There are three ways one can merge the PR
I prefer to keep using the option 1. Option 2 loses all the commit history and option 3 (though keeps the commits) but it can build inconsistant IBs.
I think we'd need to understand better the motivation(s) behind the proposal. Maybe there could be a different path forward to improve things?
Do I understand correctly that if one does a Squash and merge the PR will not show up in the IB page. Is there a way around this? Since it is one commit you do not get the problem of incomplete PRs. In many (most?) cases the commit history is not very meaningful and what matters is the PR in its entirety it would seem? Could we simply have the option of the Squash and merge, without unintended consequences in the IB monitoring?
@antoniovilela, I agree with @makortel. Lets first understand the motivation(s) behind the proposal
also note that with Squash and merge, we not only lose the history of commit but also the lose the history who to blame for the code. I did a give test PR where multiple developers provided changes but the blame only shows the user who actually open the PR. Also the actual committer loses the credit of the commit too.
Here are some further downsides on "Squash and merge" model
Written that, I share the feeling that many PRs could probably be squashed into one commit without any of the downsides mentioned so far. I believe it would be possible to find a solution that balances the different aspects (and for that we need better understanding first).
assign core, orp
Let's include @cms-sw/all-l2 for more feedback.
New categories assigned: core,orp
@Dr15Jones,@makortel,@rappoccio,@antoniovilela,@smuzaffar,@sextonkennedy you have been requested to review this Pull request/Issue and eventually sign? Thanks
include
all-l2
for more feedback
I think the ability to do squashing w/o having to ask the developers is a welcomed addition. My main argument for asking for squashing from the developers was that when one looks at the diff between releases on github (e.g. https://github.com/cms-sw/cmssw/compare/CMSSW_10_6_28...CMSSW_10_6_30) that shows the commit differences, and not the PR differences. So having one commit per PR will make that much more cleaner.
Of course well constructed commits are good for this too, but many many times we have "fix", "fix typo", "code checks", "answer to reviewers", "code checks again" as commit msgs, that's when the squashing from the reviewers/orp would be nice IMO.
Also the actual committer loses the credit of the commit too.
Doesn't the squashed commit list all the Co-authored-by
people ?
Having worked with non-CMS repositories that enforce a linear history (like llvm) I noticed that they are a lot easier to browse and follow.
Compare
cmssw$ git log --oneline --graph --no-decorate -n20
* 56711f4be87 Merge pull request #42838 from guitargeek/buildfile_duplicate_entries_1
|\
| * 7176cc0fce3 Remove duplicate entries from `CondTools/RunInfo/plugins/BuildFile.xml`
* | 213663839a3 Merge pull request #42872 from CMSTrackerDPG/FixFEDErrorGPUvsCPUImbalance
|\ \
| * | 708c739cffc store errors only for valid DetIds
| * | 951cf6cc81c skip the 2nd error word for the timeout error as in the CPU code
* | | 20c8c100813 Merge pull request #42848 from alejands/noisyGpuValidation_133X
|\ \ \
| * | | 6ec2b12513f Prevent using getByToken on EcalRawData when skipped and not consumed
| * | | 4d18fd5efba Introduce ability to choose to skip running over collections
| * | | e1408a755c3 Toggle off running over collections missing in EcalOnly configs
| * | | d8d8eb0934f Correct GpuTask flag logic to avoid running over unused collections
* | | | f3dfef0254f Merge pull request #42728 from nurfikri89/from133X_202309042300_nanoDev_jmeNanoV11
|\ \ \ \
| * | | | f3b31fa1ee5 Remove unused HF energy fractions from modifiers
| * | | | 3615d26bcf3 Use int16 data type for multiplicity variables
| * | | | 325107b52ed Add energy fractions for AK8 jets. Change data type and dummy value for multiplicity variables.
| * | | | 9f03ad5fd17 Check isPFJet() for multiplicity variables because in MiniAOD, AK8 jets below some pt thresholds have reduced information stored (e.g multiplicities)
| * | | | 9fce07625f5 Make it optional to use other PuppiProducer label used for puppi jets clustering. 'puppi' is set as the default value.
| * | | | 15db740e555 Fixes to variable names and PuppiProducer label
| * | | | f2cbbbe9a0c Fix typo
| * | | | 772b5375506 Rename multiplicity variables in nanojmeDQM_cff.py
| * | | | da915559f8d Make it optional to use a different PuppiProducer label rather than a hard-coded label. 'puppiNoLep' is set as the default value.
with
llvm-project$ git log --oneline --graph --no-decorate -n20
* 97187e127881 [AArch64] update "rm" inline asm test (#67472)
* 4bdec5830bc3 [flang][runtime] Enable more code for offload device builds. (#67489)
* 21c2ba4bdb82 [GlobalISel] Remove TargetLowering::isConstantUnsignedBitfieldExtractLegal
* 9eeb0293e27c [SLP]Cleanup MultiNodeScalars when tree deleted.
* cd7f1714dea1 [BOLT][RISCV] Implement R_RISCV_64 (#67558)
* 2e9a04b985e8 [X86]Add NO_REVERSE attribute to X86 RMW instrs in memfold table (#67288)
* 76a5602fd027 [scudo] Always express sizes in terms of element count in BufferPool (#66896)
* 6e46545b9814 Fix warning on align directives with non-zero fill value (#67237)
* eff4ef25b3dc Revert "[AArch64] Enable "sink-and-fold" in MachineSink by default (#67432)"
* ccd5b8db48a4 [X86] Change target of __builtin_ia32_cmp[p|s][s|d] from avx into sse/sse2 (#67410)
* 07d08f4eb44e [libc++] Don't add reference to system_category when exceptions disabled (#67504)
* 5109cb28fda8 [mlir][bufferization] Make buffer deallocation pipeline op type independent (#67546)
* 6e1dcc933511 [libc++] Refactor string unit tests to ease addition of new allocators
* 47b7f33b13b6 [IR] Allow llvm.ptrmask of vectors (#67434)
* cf7eac9650f3 [ObjectSizeOffsetVisitor] Bail after visiting 100 instructions (#67479)
* f3c417f34113 [Passes] Add option for LoopVersioningLICM pass. (#67107)
* 918863deb0af [clang][Diagnostics] Make 'note' color CYAN (#66997)
* 30fe8762446c [mlir][cfg-to-scf] Fix invalid transformation when value is used in a subregion (#67544)
* d338acf36edd [Flang][OpenMP] NFC: Version of a few tests with HLFIR
* 7c128f6d0e35 CostModel/RISCV: tweak cost of vector ctpop under ZVBB (#67020)
Answering myself: yes, the commit message for a squashed PR shows the additional authors as Co-authored-by
:
assign core, orp
Let's include @cms-sw/all-l2 for more feedback.
Many thanks, it is very good to hear from @cms-sw/all-l2
Also the actual committer loses the credit of the commit too.
Doesn't the squashed commit list all the
Co-authored-by
people ?
yes the squashed commit message contains the co-author but does not contain the information about who change which part of the code e.g. https://github.com/cms-sw/cms-bot/commit/7819e3c430e2e3ff07478b374d01f5974d0d043b is a squashed commit and the actual changed code does tell who change what. So you can not properly blame the right author :-)
Here are some further downsides on "Squash and merge" model
- it doesn't allow sharing a branch for PRs in different release cycles (of course the branch must be based on a commit/tag that is common between the release cycles)
- it doesn't allow (out of the box) basing a "further developments branch" on top of a branch used for a PR
- for the PRs that have a meaningful commit history, it would make the code archaeology more difficult
Written that, I share the feeling that many PRs could probably be squashed into one commit without any of the downsides mentioned so far. I believe it would be possible to find a solution that balances the different aspects (and for that we need better understanding first).
I still feel that in most cases, squashing into a single commit would be the better option.
Could we add a "squash flag", different L2s could add it to flag the PR to be squashed, without needing to ask authors and delaying the approval, and release managers would then use Squash and merge for these PRs.
On a different note, do we already have regular training sessions on project collaboration with Git? This might be a more important outcome of this discussion.
As past release manager, I would have really liked the possibility to have a "squash" button, that just squashes the commits, without merging them. I would have used it in those (few) cases where the authors ended up with tens of useless commits, and were not able, or did not want, to squash them themselves. I would say that the release manager should use it only in particular cases, and with the agreement of the authors and the reviewers of the PR. Then the squashed commit(s) could be merged as usual in the next step, without renouncing to the merge commit.
Then the squashed commit(s) could be merged as usual in the next step, without renouncing to the merge commit.
This is not a workflow that GitHub supports (squash, then merge).
Also, what is the advantage of keeping the merge commits ? One of the advantages of always squashing instead of merging is to get rid of the merge commits altogether...
Hi all,
Just to give a small anecdote and a mixed opinion on this proposal. I recently implemented a runTheMatrix workflow which got squashed into someone else’s commit. That other person did the main development but needed help on runTheMatrix because they didn’t have experience with the runTheMatrix code. Now, this other person is the one who receives messages on how to use this workflow (which incidentally has a fairly large “consumer base”) and needs to forward questions to me about using it. It may help to avoid such confusions if the blame accurately reflects who introduced which feature.
Would it be possible to squash commits on the basis of committer, so as to keep the blame consistent, while reducing the number of individual commits on a single-user basis?
However, I do feel that it is important to be able to squash things because having to look at 10 different commits to track down what happened in particularly troublesome PRs can be really time-consuming and annoying.
Also, what is the advantage of keeping the merge commits ?
To have them listed in the IBs, as well as in the release notes
If all PRs were squashed instead of merged, the IBs and release notes could simply list all commits.
And we could make the bot use a standardised format for the squashed commit message, like
Update GPU workflows (#42713)
A longer description of the commit would go here - maybe the text of the PR description when it was merged.
Maybe post a recipe somewhere how to squash with leaving just a (new) single commit message...
Maybe post a recipe somewhere how to squash with leaving just a (new) single commit message...
This exists, people just usually dont do it (and they have to be pushed which then makes the review process longer)
Well yes I can google it and lots of different recipes come up, interactive work needed, editing a file, blah blah.... lots of work.
We have cms specific git commands like git cms-addpkg, so I propose one makes a cms-squash command which assumes all commites in a repo (from which the PR is generated) to be quashed. This is listed together with the bot commands when a PR is made, so that proponents of PRs can choose to apply this easy recipe (similar to code formatting) and know what to do and do not need to search for it. (IOW, make it easy for non git experts).
We have cms specific git commands like git cms-addpkg, so I propose one makes a cms-squash command which assumes all commites in a repo (from which the PR is generated) to be quashed. This is listed together with the bot commands when a PR is made, so that proponents of PRs can choose to apply this easy recipe (similar to code formatting) and know what to do and do not need to search for it. (IOW, make it easy for non git experts).
Yes, that's a reasonable request.
I think the most complicated part in writing such tool is how to figure out what is the "starting point" to use for the squash.
A possibility could be to use the commit corresponding to the release itself, i.e. if one is working in CMSSW_13_3_0_pre3
or CMSSW_13_3_X_2023-09-27-2300
, the script could produce a single commit including all the changes on top of those tags.
@makortel Matti, would you like to summarize the proposals? Many thanks.
Matti, would you like to summarize the proposals? Many thanks.
I'd suggest we discuss this topic in the core software meeting next week (as a first topic, so should have minimal impact to ORP attendees that want to participate but not follow full core software meeting).
Outcome of the discussion in the core software meeting
see https://github.com/cms-sw/cms-git-tools/pull/125 for cms-squash-topic
Hi all,
Sorry to be a wet blanket on this very important development (and I totally support making improvements to the current situation) but---
I am still worried about the fact that contributors to CMSSW will get their contributions “erased” from history due to indiscriminate squashes.
There are many contributors to CMSSW already whose contributions are lost to the winds of time due to their contributions being incorporated “silently”. And these folks who contributed are really hard to dig out and to send emails to even though their input might be forthcoming but nobody can find them because their contributions are erased from history.
I really think it is important not to “squash” away contributors to CMSSW. But that is what you all have seemed to have agreed on. Please look at my earlier post here to reconsider this decision.
Best, John
From: Matti Kortelainen @.> Sent: Tuesday, October 10, 2023 5:59:23 PM To: cms-sw/cms-bot @.> Cc: Hakala, John Charles (jh7uq) @.>; Team mention @.> Subject: Re: [cms-sw/cms-bot] Change default action to squash merge in cmssw repo. (Issue #2080)
Outcome of the discussion in the core software meeting
— Reply to this email directly, view it on GitHubhttps://github.com/cms-sw/cms-bot/issues/2080#issuecomment-1755748034, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB6MZFSR3TAH6Y4NWXD2OHLX6VWFXAVCNFSM6AAAAAA5H345E6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJVG42DQMBTGQ. You are receiving this because you are on a team that was mentioned.Message ID: @.***>
But @jhakala if the user does the above introduced cms-squash-topic
, it will be their contribution all along. At least that's my understanding
@kpedro88 What does the cms-squash-topic
do if the branch contains commits from different developers? Does it include the others-than-squasher with Co-authored-by:
?
As a side comment I'd find the model of directly approaching only the committer of a particular change and not including the assigned maintainers (that would be PdmV in case of runTheMatrix
) in the communication suboptimal in general (there are certainly exceptions).
@makortel to test this, I picked a random recent PR branch and added my own trivial commit to it.
cms-squash-topic
: https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squashgit rebase -i
to squash: https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-rebase-squashIn 3. the commit author is me, with the original commit authors listed in the combined commit message. In 4. the author of the first commit is preserved, with me as the committer. This is slightly better, but doesn't scale beyond 2 people (author and committer fields can be different, but each one can only have one value).
GitHub and GitLab support a co-author standard described here: https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors.
With a modified version of cms-squash-topic
, additional authors can be included:
However, if cms-squash-topic
is called a second time (or more generally, called on a branch that includes commits that already have coauthors), the coauthor info is lost:
This can be addressed with a further modification (at the cost of a bit of redundancy):
https://github.com/cms-sw/cms-git-tools/pull/125 is updated with the changes.
@jhakala I hope https://github.com/cms-sw/cms-bot/issues/2080#issuecomment-1761329070 addresses your concern more directly.
The issues you raised were actually the subject of a long discussion in the core software meeting on Tuesday, and generally there was agreement that we want to attribute changes properly. That led to the plan in https://github.com/cms-sw/cms-bot/issues/2080#issuecomment-1755748034 to provide a tool whose use can be requested at the discretion of the release manager (or for other purposes for which it might be convenient).
Also, it would be really helpful if people opened pull requests with a useful commit history and meaningful messages.
.A
Thanks Kevin,
This does address all my concerns and I appreciate your work on them.
Best, John
From: Kevin Pedro @.> Sent: Friday, October 13, 2023 1:06:41 PM To: cms-sw/cms-bot @.> Cc: Hakala, John Charles (jh7uq) @.>; Mention @.> Subject: Re: [cms-sw/cms-bot] Change default action to squash merge in cmssw repo. (Issue #2080)
@jhakalahttps://github.com/jhakala I hope #2080 (comment)https://github.com/cms-sw/cms-bot/issues/2080#issuecomment-1761329070 addresses your concern more directly.
The issues you raised were actually the subject of a long discussion in the core software meeting on Tuesday, and generally there was agreement that we want to attribute changes properly. That led to the plan in #2080 (comment)https://github.com/cms-sw/cms-bot/issues/2080#issuecomment-1755748034 to provide a tool whose use can be requested at the discretion of the release manager (or for other purposes for which it might be convenient).
— Reply to this email directly, view it on GitHubhttps://github.com/cms-sw/cms-bot/issues/2080#issuecomment-1761331999, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB6MZFWE3TM4CKFXRRGFKQDX7EOEDAVCNFSM6AAAAAA5H345E6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONRRGMZTCOJZHE. You are receiving this because you were mentioned.Message ID: @.***>
@makortel to test this, I picked a random recent PR branch and added my own trivial commit to it.
- The original branch (mirrored to my fork): https://github.com/kpedro88/cmssw/commits/Run3-sim149Y
- The branch with my extra commit: https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-plus
- After running
cms-squash-topic
: https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash- Instead using
git rebase -i
to squash: https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-rebase-squashIn 3. the commit author is me, with the original commit authors listed in the combined commit message. In 4. the author of the first commit is preserved, with me as the committer. This is slightly better, but doesn't scale beyond 2 people (author and committer fields can be different, but each one can only have one value).
GitHub and GitLab support a co-author standard described here: https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors.
With a modified version of
cms-squash-topic
, additional authors can be included: 5. https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash2However, if
cms-squash-topic
is called a second time (or more generally, called on a branch that includes commits that already have coauthors), the coauthor info is lost: 6. https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash2againThis can be addressed with a further modification (at the cost of a bit of redundancy): 7. https://github.com/kpedro88/cmssw/commits/Run3-sim149Y-squash2again2
cms-sw/cms-git-tools#125 is updated with the changes.
Hi Kevin, all, belated thanks!
Let's continue to follow up on the list in https://github.com/cms-sw/cms-bot/issues/2080#issuecomment-1755748034, namely pointing to the squash command in cms-bot messages; check if there is no diff in the PR and if so not reset the tests & signatures; more generally, not reset signatures related to a given package that was not updated in a new commit in the PR.
Thanks again.
FYI, bot PR #2144 should allow to squash a cmssw PR without resetting the L2's signatures. As squashed creates a new commit so the PR checks will be reset but bot will re-run the code-checks and PR tests automatically.
FYI, bot PR #2144 should allow to squash a cmssw PR without resetting the L2's signatures. As squashed creates a new commit so the PR checks will be reset but bot will re-run the code-checks and PR tests automatically.
Fantastic!
@antoniovilela , can we close this issue now?
@antoniovilela , can we close this issue now?
Yes, I think so.
As discussed in the ORP meeting (2023-SEP-19), managers and users agree to squash commits when merging to master. This issue is to follow-up on what is needed in cms-bot.