Closed tajmone closed 4 years ago
@thoni56, maybe the simpler approach is to first review and accept the PRs on beta7-*
dev branches, since they are in synch with the beta7-prep
branch, and then fix conflicts on beta7-prep
by rebasing it on master
branch.
This way we'll have to focus on a single branch, instead of 4.
I used https://draftable.com to easily diff two pdfs. It doesn't detect whitespace changes, so it misses this exact PR, but it is easy to scroll two PDFs side by side giving you this view:
@thoni56, as mentioned in #72, I've started to enforce consistent code styles via EditorConfig in dev-editorconfig
(branched from master
).
What I would like try, locally first, is:
master
and all the beta7-*
dev branches.dev-editorconfig
into master
.beta7-*
PR:
master
branch with enforced code styles consistency.validation.sh
checks.beta7-prep
all the pending PRs after each merge.This means that we should wait before merging any PR into beta7-prep
— otherwise we'll have to fix the code styles and amend every commit of beta7-prep
, which is harder than doing it on each PR before it's merged.
Since the dev-editorconfig
branch also fixes code styles in many sources (some of which are involved in the PRs' changes), this seems the only feasible approach without breaking the PR branches.
Also, this should not only help to solve all current conflicts, but also make it easier to keep rebasing beta7-prep
on master
in the future — after all, one of the purposes of EditorConfig settings is to eliminate much of the noise that usually breaks history linearity due to meaningless content changes introduced by editors differences.
What do you think? Is this the right way to go? Am I mistaken, or missing out something?
P.S. — The repository already contained an
.editorconfig
settings file, but it only covered Alan sources and the.a3sol
/.a3log
solutions/transcripts files, to enforce Latin1 encoding, trimming trailing spaces and enforcing an empty line at the file end (to prevent, respectively, ALAN compilation errors, erratic spaces in transcripts, and incomplete transcript sessions). So it was more of a quick fix than a well-thought enforcement of consistent code styles.The current update will be covering all the file types used in the project, enforcing good styles conventions on the various editors/IDEs, and — last but not least — validating all commits and PRs via Travis CI, to prevent noise in contributions.
I used https://draftable.com to easily diff two pdfs. It doesn't detect whitespace changes, so it misses this exact PR, but it is easy to scroll two PDFs side by side giving you this view:
The PDF diffs of this tool look quite neat.
I use BeyondCompare 4 for diffing, which does offer specialized viewers for diffing various formats (Hex, images, CSV files, MP3, and others), including PDF, but the PDF diff previewer doesn't reproduce the actual styles, it only shows the text contents representation, without whitespace differences, but more like if it was a plain text file (therefore, no images either). I haven't used it much lately, so I don't remember the details, but surely it's not as neat as the dedicated tool you pointed out.
Both BeyondCompare an Draftable are commercial tools, so having already invested money on the former I'm reluctant to spend further money on another tool — having to renew license when there's a major update, as well as having to learn a new tool and keeping it updated on the machine — but, surely, Draftable is quite tempting for someone who has to diff PDFs frequently.
BeyondCompare 4 is a quite a powerful tool, with many features and customization options, and in over three years after its purchase I can't really say that I've managed to learn how to make the most out of it — can't find the time to dig into the docs and play around with it. I mostly use it at a basic level, although I did attempt to create an ALAN syntax highlighter for it (poor results, the syntax format is not very nice, seems buggy and has an horrible integration interface).
In the future I should really invest more energy on BC4, especially on customizing automated diffing reports on a per-project basis, which is one of the cool features that attracted me to BC4 (beside the possibility to integrate with Git and many Git GUIs). But then, I loath having to spend more time learning the tools of the trade instead of focusing on the trade itself (i.e. programming instead of tooling).
Since the dev-editorconfig branch also fixes code styles in many sources (some of which are involved in the PRs' changes), this seems the only feasible approach without breaking the PR branches.
Because of this I think you are right. Although I haven’t consideted the merge of #40.
@thoni56, the good news is that I've finished enforcing code styles consistency across the repository (still in dev-editorconfig
only) and none of the Manual AsciiDoc sources required any fixes!
So, even after merging into master
the dev-editorconfig
branch, these changes shouldn't affect the PRs rebasing operations (at least, not the AsciiDoc sources, which would be the trickier to handle).
Right now, in dev-editorconfig
I've kept the sources fixes into individual commits per file type involved (i.e. all markdown fixes in one commit, and so on with shell scripts, Alan sources, AsciiDoc, etc.).
I was wondering if we should squash them into a single commit or just keep them separate. On the one hand, having split commits per file type might be more manageable in terms of revisiting the repo history (blaming, etc.); but on the other hand it makes a longer commit history (where we could squash 4 commits into a single one).
What do you suggest, based on your experience with long-running Git repos? Separate commits per file-type fixes, or all fixes squashed into a single commit?
@tajmone Nice! And I like separate commits for the formatting fixes.
In my experience, there is seldom a problem with many commits, This is a problem that shows up when the development organization grows. Up to a couple of teams (~10) I recommend single-branch development and committing directly to master (or your main branch) because of the immediate feedback you get (and it forces you to run tests and what not, locally).
Again, in my experience, the value of a main branch with squashed commit so that you only get a single commit per "feature" only shows itself if you have a QA-department and process that inspects, tests and documents the new features in the release. And that's, luckily enough, no us ;-)
So the only reason to use branching in the alan-docs
project is to keep a version synced with the releases of the SDK. OTOH we could do that by creating the HTML and PDFs from a tag which represents that sync point.
Of course, we could potentially find problems that we want to fix in the "released" documentation, but that is not a big deal. "bugs" in documentation can always wait for the next release. At least if you ask me.
(In some lean/agile philosophical perspective, branching pushes feedback, and work, to a later time, which is always something to watch out for. And avoid, or minimize, if you can.)
This became kind of a rant about branching, sorry for that, but I guess it's a background for my feeling that it doesn't matter much if we squash or not. We don't have the same "requirements" as large development organisations.
@thoni56, thanks for sharing your thoughts on branches strategy (I'll comment on them later, since they are very helpful).
Right now, I'm really struggling hard to reabse the beta7-prep
on master
— it all goes back to that "elisions with apostrophe" commit that was based on an old version of master
, containing old XRefs. That, and the subtle changes in the various README files (sections being moved around, the autogenerated TOCs out of sync, etc.).
The problem is that there are too many commits in the divergent history, and it's hard to track what you're trying to fix at each rebase stage (with the hash references being hard to memorize, and the awkward use by Git of the terms "ours" and "theirs"not helping either).
Probably the best approach would be to rebase on specific commits, one at the time, all the way up to the current HEAD
of master
, instead of rebasing on the whole change-set at once. It's going to take longer (and involving many temporary branches too), but it should help to find the knots which are at the core of the conflicts, and solve them at once — once that's out of the way, the rest of the rebase should be a smooth operation.
I think that the main problem here is due to a couple of AsciiDoc sources of the manual, the rest of the files needing only to be rebased on "ours" (which, apparently, according to Git it means the other branch, i.e. the rebase from, in the rebase context, where "ours" and "theirs" are swapped compared to merge operations — such a human friendly approach! linguistics are definitely not the strength of Git, which is permeated by dubious semantic choices all over its interface and documentation; sorry for the rant).
What worries me most are the HTML and PDF builds of the various commits — if we amend the ADoc sources of the Manual (i.e. integrating changes from both "their" and "ours") they won't be mirroring the actual document status any longer. But rebuilding them on a per-commit-basis is going to require much fiddling in detached head commits (not to mention potential issues with updated dependencies in the toolchain).
I'll do my best, but it's hard to be 100% sure that no edits are lost in the process — especially since the HTML previews can no longer be relied on as a way to compare results.
Yes, I have seen that too. I'll have a look at the changes again.
Perhaps one apporach would be to just look for changes in the PDF:s and the hand-edit the changes into master.
(I'm starting to feel that since there is a lot of (valuable) cruft around the "text", having documentation in text/mergeable form is only half-valuable... One would wish for a separation of "text" and "administrative data", like indexing and stuff. But perhaps this is mainly because or documentation is in this infant state, when, through your tremendous work, that "administrative data" is changing a lot.)
A PDF diff between master
and handle-ellisions
can be found here, (The draftable on-line comparison is free...)
And there are actually very few essential changes in the text. Most are, as alluded to above, index, crossreferences, page headings, ... a lot of things that are "generated" or "administrative". I think it would be easier to just re-do the "real" changes by hand.
But having said that I'm not sure in which order we should do them. Probably it would be better (=easier) to merge the beta7-prep-* into beta-7-prep and then merge that. And finally re-do the ellision edits.
@thoni56, I think I've unraveled all the knots. I took a better approach:
_new_beta7-prep
branch from master
(now updated with the new EditorConfig settings and polished sources)._new_beta7-prep
I cherry picked the various commits from beta7-prep
, solving conflicts one-commit-at-the-time, and then amending any files that didn't pass the EditorConfig validation.As it turned out, many commits would turn out as empty after solving conflicts, because they were already handled in master
somehow, so the number of actual commits dropped down drastically.
This approach was much cleaner, and it gave me the chance to focus on every commit's significant changes, leaving out noise from the conflicts. I'm confident that nothing went lost.
So, now we basically have a clean version of beta7-prep
, which I'll only need to rename from _new_beta7-prep
to beta7-prep
and force push it to origin to overtake the current conflict one (I've also kept a local backup of the current beta7-prep
branch, in case we need to roll back).
Now I only to rebase the branches of the pending PRs on the new beta7-prep
, and maybe reduce their commits numbers by squashing them, and amend source files according to the new code styles conventions. But this part should be easier.
Therefore ... I'll be soon force-pushing the dev branches and replacing them with their new versions, and push also their old backups, so we can keep them until everything is merged and fine.
I now realize that my mistake was that I kept thinking that we'd be soon squashing beta7-prep
into master
and didn't think instead that we should have been able to rebase it on master
from time to time.
On the contrary, I often added global features in beta7-prep
, and then checked them out in master
to share them with the main repo; and also merged master
into beta7-prep
from time to time, in order to access some updated feature from there.
From now on, I'll make sure that the base dev branch will always be rebasable on master
, and only commit documentation contents changes on the dev branch, and repository configuration/feature changes always on master
, and just rebase the dev branch as needed.
That sounds really good!
I'm not sure I'll ever learn to handle merges, rebases and their conflicts in a fluent manner ;-)
Your lessons learned ring true with the feeling I had, but you put it nicely with "documentation contents changes". When I did the PDF diff I was thinking that it might be handy to have a tool that extract only the text from the Asciidoc, or PDF for that matter, and we can diff that. The will show the "documentation contents changes" that we want to focus on when looking at dev branches.
Looking forward to some light in this tunnel! Good work!
it might be handy to have a tool that extract only the text from the Asciidoc, or PDF for that matter, and we can diff that.
I've found a similar tool for pandoc markdown (written in Haskell) which parses both docs, builds an AST and then diffs the nodes; if I remember correctly it also allows some options on what to show and hide (e.g. styles, heading levels, etc.). Unfortunately I haven't found a similar tools for AsciiDoc, and I believe that it would require an AST parser to do that.
Clean up the dev branches for the upcoming Beta7 release of the ALAN Manual (the first official release):
beta7-prep
onmaster
, solving conflicts as required.beta7-*
dev branches on the fixedbeta7-prep
branch:AppG-i18n-squash
beta7-prep-PDF-gaps
(was already merged)handle_elisions
beta7-*
dev branches — rebase and fix them as needed and then merge them, or delete them if rejected:After all PRs are merged, we should consider deleting left over branches:
AppG-i18n
(the unsquashed version ofAppG-i18n-squash
)AppG-i18n-squash
beta7-prep-PDF-gaps
handle_elisions
@thoni56, we need to unravel the conflicts with the dev branches and
master
, so we'll be able to merge/squash without problems when Manual Beta7 is ready for release.I'll try to reabse
beta7-prep
onmaster
, and then all otherbeta7-*
branches onbeta7-prep
. Then we really need to merge thebeta7-*
branches back intobeta7-prep
, so if you could start reviewing them we'll be able to save some time (not sure why those PRs have been hanging ther for so long, but after such a long time is going to be harder to review them and catch up).As I start to get a clearer picture of the required steps, I'll be updating this entry post and the task list above.