README.md: explain motivation behind dasharo-pq

SergiiDmytruk commented 6 months ago

Thanks, but it seems to me that perceived benefits of patches aren't unique to them, you can do the same with git rebase and git format-patch. I know I've said "patches" in https://github.com/Dasharo/dasharo-pq/pull/1#issuecomment-2119237814, but only now do I get that this moves from a single branch (dasharo) back to a bunch of them (patch series == branch), meaning that all negative aspects come back as well: applying the same fix in several branches, rebasing several branches, conflicts of different branches. Having changes as files instead of part of .git doesn't address any of this, because it's just a different way of storing branches.

The only potential benefit I see so far is that while you can't nest git rebase because git rebase --edit works only in forward direction, you can go back and forth in patch series with its push/pop operations. However, my workaround for that when dealing with hundreds of commits is to create a temporary branch at something like HEAD~150, rebase its top 10-20 commits and later rebase original branch onto the temporary branch. This reduces number of conflicts and allows to make changes faster, but I rarely had to do this.

pietrushnic commented 6 months ago

but only now do I get that this moves from a single branch (dasharo) back to a bunch of them (patch series == branch)

I'm not sure why you get to that conclusion, it really depends on the organization of work. Right now, the whole effort is at the PoC stage, so there are no set-in-stone rules about the way we would manage the patch series or how that relates to branches. It would be best to discuss each scenario or get back to the initial problems from the issue https://github.com/Dasharo/dasharo-issues/issues/310

Of course, if you say everything is ok with the current method of Dasharo patch/commit management described in X (link needed) and this PoC is not needed I'm ok to get that critique.

I make this effort as part of the Dasharo (coreboot+SeaBIOS) release for PC Engines, and the concept I pursue here was initially endorsed by @krystian-hebel and @andyhhp, who maintain Xen using a similar process.

@krystian-hebel @andyhhp Maybe I missed some important justification and benefits of using the methodology, which is here: https://github.com/xenserver/xen.pg. I guess TrenchBoot also manages patches, and Qubes OS probably does, too, so what is the correct explanation?

SergiiDmytruk commented 6 months ago

I'm not sure why you get to that conclusion, it really depends on the organization of work.

If one can split changes into independent sets which avoid conflicts and quilt allows to pick those sets, then it could be different than branches, but I don't know if quilt provides that functionality.

Of course, if you say everything is ok with the current method of Dasharo patch/commit management described in X (link needed) and this PoC is not needed I'm ok to get that critique.

Per-platform branches didn't work well and a single dasharo branch seems to be a better way of going about it. I just don't see how storing patches in files instead of .git is going to make a big difference (well, maybe there will be fewer commits that fix previous commits).

andyhhp commented 6 months ago

It is a matter of preference. There are pros and cons to each method, but you've already found some of the cons and are looking for an alternative.

With a mono-branch, you've got accumulated bugfixes, features, and $N platforms worth of configuration.

You could periodically merge this branch with upstream, but you immediately lose the separation between what's upstream and what's local.

You could periodically rebase the branch, but the chance of making an error is exponential with the length of the mono-branch, and when you force push, you break everyone else working on the branch.

With a patchqueue using quilt/guilt/stgit(in patch mode), the canonical representation of the mono-branch is in patch form, itself in version control. This means you're not rewriting history on your colleagues, and a large "rebase" (which is just a plain commit or several in the pq repo) can be done in steps where each one can be checked to be correct.

It is a matter of preference, and patchqueues do feel weird to start with, but they are a good solution to work around the kinds of problems you've found. Like all code, patchqueues need regular maintenance to keep them nice to use, but the overall effort required is far less IMO.

krystian-hebel commented 6 months ago

you can do the same with git rebase and git format-patch

Exactly, you can. By doing the patches you have to. This is more about forcing the developers to do the Right Thing™ from the beginning rather than anything else.

For a long time we had a strong anti-force-push policy that lead to much headache down the line. While it is possible to keep the history clean and simple with properly done rebase, we can't rely on everyone knowing git well enough to not create a mess that will be difficult to get out of. Patches repository may be a good middle ground until we can reeducate ourselves to not treat forced pushes as something bad.

I understand the reluctance when it comes to some of the points like synchronization issues or single branch problem. I think these can be somewhat alleviated by grouping patches in groups, similar to how Qubes does it. We could come up with numbering scheme that tries to make managing multiple branches as simple as possible, e.g. patches starting with 0000-M are expected to be upstreamed now, M-N is common Dasharo code that isn't planned to be upstreamed in the foreseeable future, N-O is specific to one vendor, O-P the other vendor etc. I'm also not sure if this is supported by quilt.

I agree that keeping code in form of patches is harder to maintain than keeping it on branch, however I think of it as an advantage. This will hopefully steer us towards upstreaming more of our code early, just to make someone else having to deal with maintaining it. This may sound bad and someone may see me as lazy (well, I am) for saying so, but we had a few cases where after code was merged upstream, both we and upstream fixed it in parallel, which doubled the work, or even tripled it when counting time required for conflict resolution.

This is something that we wanted to try for some time. It may be better, it may be worse than what we currently have. As I understand it, no decision was made yet, but I personally think that we should force every developer to at least try it (maybe in form of hackaton, where everyone creates a pseudo-release with "useless" changes? I haven't tried building coreDOOM yet :slightly_smiling_face:). As with all such changes, it would be best if for some time both approaches were done simultaneously, ideally automated, but on the other hand synchronizing between them may be a waste of time if in the end only one approach will remain.

@pietrushnic I only glanced over the README, it obviously must include what was discussed here in some form, if developers that recently did big rebase actions have difficulties understanding it now, it will be even worse for less experienced ones. I also agree with what Sergii said, most of what you wrote as benefit of patch can also be done on branch in one form or another. It would be nice to have also some cons, or pros of branches if you want to stick to positive language.

There is one significant advantage that in my opinion deserves a mention: testing if a set of patches builds with current coreboot seems to be simpler and much less scary than rebasing a branch, and is easier to automate. This may immediately show that there were changes on upstream that either fix the same issue we're fixing or have potential to break our work. In fact, we could make CI build with patches applied both on top of specified commit (e.g. taken from git format-patch --base=...) and main, this would be something like stable and nightly.

pietrushnic commented 6 months ago

Just for clarity, we are talking about guilt, not quilt, although there should be feature parity; the difference is that first has support for git and was created to work well with git, and second treats patches as plain text.

I understand the reluctance when it comes to some of the points like synchronization issues or single branch problem. I think these can be somewhat alleviated by grouping patches in groups, similar to how Qubes does it. We could come up with numbering scheme that tries to make managing multiple branches as simple as possible, e.g. patches starting with 0000-M are expected to be upstreamed now, M-N is common Dasharo code that isn't planned to be upstreamed in the foreseeable future, N-O is specific to one vendor, O-P the other vendor etc. I'm also not sure if this is supported by quilt.

This can be managed by a series metadata file. IIUC.

@pietrushnic I only glanced over the README, it obviously must include what was discussed here in some form, if developers that recently did big rebase actions have difficulties understanding it now, it will be even worse for less experienced ones. I also agree with what Sergii said, most of what you wrote as benefit of patch can also be done on branch in one form or another. It would be nice to have also some cons, or pros of branches if you want to stick to positive language.

Sure, I will try to fix the documentation according to this recommendation based on your comments.

pietrushnic commented 6 months ago

@andyhhp are you aware of any discussion on mailing lists regarding guilt/quilt? Why does Xen use guilt? I guess key developers, at some point, decided to use guilt for a reason. I would like to give a comprehensive background for those who would like to better understand the motivation behind this experiment.

andyhhp commented 6 months ago

@pietrushnic Xen doesn't use guilt; it's XenServer which does. (You're looking at XenServer's patchqueue on top of upstream Xen)

In XenServer, we've always had a strong split between the "pristine upstream" base repo, and our local deviations on top.

We used to use mercurial, and mercurial had excellent support for patchqueues. It was a proper integration of the quilt-like workflow into hg and it was nice to use. (So much so that my .gitconfig aliases guilt commands as their hg-pq abbreviations.)

Then the world switched to using git, and we wanted to continue to work in the same way. guilt was the only real option going (IIRC at the time, even stgit couldn't avoid mutating history but has since gained this feature), but as a quilt clone in git, it was basically a drop-in replacement for what we already had.

There are developers who use the XenServer patchqueue repos with stgit instead of guilt. Again, it's a matter of choice, and works because a quilt-like patchqueue's canonical representation is just a series of patches.

pietrushnic commented 6 months ago

@SergiiDmytruk @krystian-hebel @andyhhp I replaced the previous argumentation with a shorter summary of the argument I could identify in the discussion so far. Please let me know if it is helpful and deserves your approval. If not, I would appreciate further discussion, especially suggestions on improving argumentation to provide correct background, motivation, and explanation of assumed benefits.

Dasharo / dasharo-pq

README.md: explain motivation behind dasharo-pq #5