Handling Changes, Conflicts, and Merges

Phillipus commented 3 years ago

Thinking about this more, I'm not even sure how we can manage changes, conflicts, and merges if we are storing one *.archimate file in the git repository.

There could be a textual change or conflict but not a logical model change or conflict (hence the reason for using grafico XML files). So we can't rely on git's usual mechanisms for detecting changes and conflicts.

But I'm not sure I'm thinking about this clearly...maybe this won't be a problem. After all, a grafico View XML file is also a large file...

jbsarrodie commented 3 years ago

:-)

SImple answer: we can use git to detect that a model file has been changed, but nothing more. We have to take over and implement our own diff/merge tool and then force git to commit the result. That's something doable with command line git (you can configure it to load external diff/merge tools for pre-defined types of files), and I can we can do it too (in a way or another) with JGit.

So we have to code ourselves a logic similar to what git does:

If previous commit was the very last one and there is no other commit on remote, simply commit the new file (no need for a merge). Else...
Do a diff between common ancestor and ours. Get a list of changes A
Do a diff between common ancestor and their. Get a list of changes B
Compare both lists of changes and extract conflicts from these lists
Ask user what to do for each conflict. This result in a new list of changes C
Apply changes from A, B and C. Changes have been merged
Create the merge commit with resulting model

jbsarrodie commented 3 years ago

So we have to code ourselves a logic similar to what git does:

If previous commit was the very last one and there is no other commit on remote, simply commit the new file (no need for a merge). Else...

Do a diff between common ancestor and ours. Get a list of changes A

Do a diff between common ancestor and their. Get a list of changes B

Compare both lists of changes and extract conflicts from these lists

Ask user what to do for each conflict. This result in a new list of changes C

Apply changes from A, B and C. Changes have been merged

Create the merge commit with resulting model

Additional inputs and comments for this...

Good readings on this topic:

Step 2&3. Do a diff between common ancestor and ours/their

Finding the common ancestor can be done using git merge-base which has an equivalent in JGit.

Step 2&3. Get list of changes

This means we'll have to create some classes for that:

ChangeCollection: contains a list of changes between two models. Contains references to both models.
Change: represents a change on something between two models. Could be related to almost any class used in a model (not only concepts folders and views because a change can target more atomic things like documentation or properties) and can be of several types:
- Added
- Deleted
- Modified
- Moved

Of course, this will involve comparing all objects between two versions of the same model and I'm sure we can find some tricks to help on that. I don't know if they already take all cases into account, but .equals() methods could be used to check if model objects (concepts, views and folder) are the same. A MVP could be to first consider there objects as atomic, but a future version should then deep dive on object to find what did exactly change.

Step4. Compare both lists of changes and extract conflicts from these lists

A conflict is found when two Change related to the same object (in a broad change) are present in A (ChangeCollection between common ancestor and ours) and B (ChangeCollection between common ancestor and theirs) and are different (if both changes lead to the same "value", then there is no conflict).

Step 6. Apply changes from A, B and C. Changes have been merged

Several options here:

A, B and C Changes can be applied to common ancestor, thus "redoing" everything
B and C Changes can be applied on current version of the model (ie. ours)

Note: at this step, A and B no more contain conflicts as they have been removed at step 4 to create C.

Important edit: There is still a need for additional "intelligence" to solve some edge cases like:

an element has been removed in ours and added to a view in their : they are different changes but if we apply both of them model becomes invalid
a relationship has been edited and some of its ends have been changed, but a new view use the "old" definition All of these cases will need some specific code that I usually call "Anticorruption Layer". Most of these cases are already handled by coArchi 1 or modelimporter, so we should be able to manage them in coArchi 2.

Phillipus commented 2 years ago

For reference:

Even though git might successfully merge a branch or pull from remote the resulting model might no longer be valid. So we have to check model integrity in all cases.

Here are some examples (from @jbsarrodie):

Start with a model containing an element E

User 1 deletes E
User 2 creates a view V containing E
Result is a model with V containing E while E no more exists

Start with a model containing two elements A and B, an a relationship R from A to B

User 1 changes R so that it is still the same object but now goes from B to A
User 2 creates a view V containing A, B and R
Result is a model with V containing a visual connection from A to B while the related relationship goes the way arround

And one from me:

Start with a model containing a Note that has a custom image

User 1 adds another Note referencing that same image
User 2 deletes the Note or the image in the Note
Result is a model with a missing image

Phillipus commented 2 years ago

Taking the first example from the previous comment.

E exists but is unused
I use E in a View
Meanwhile they delete E and Push
I Pull

The pull is successful and a new merge commit is created from merging theirs and mine, called Merge branch "the-branch" of "https://www.onlinerepo.com/test.git" into the-branch

However, the model is broken so we need to fix and resolve it. As in coArchi 1, this will result in another commit with the restored objects.

Perhaps another way to do this is to do a kind of dry run:

Fetch, don't merge
Diff and resolve
Merge commit

See https://stackoverflow.com/questions/17222440/is-there-a-git-pull-dry-run-option-in-git

jbsarrodie commented 2 years ago

Perhaps another way to do this is to do a kind of dry run:

Yes. In fact the real target for me is to no more rely on git for diff and merge so that we know exactly what changed (even for a simple commit) and are sure that everything is valid (in case of a merge).

Phillipus commented 2 years ago

EMF Compare works well for doing two and three way model comparisons.

compare1

compare2

RaimondB commented 7 months ago

Is it possible to support a PR workflow? That would really fit for us so that we can also have branch protection rules.

jbsarrodie commented 7 months ago

Is it possible to support a PR workflow?

No, PR workflow is managed at Git Server level (such as GitHub or GitLab) and can't work with Archi as Archi absolutely has to manage all git operation or else the model will end up corrupted very quickly (coArchi includes an anti-corruption layer which fixes issues created by merge operations).

That would really fit for us so that we can also have branch protection rules.

If you use branches in Archi, you can already setup a very similar workflow: people with no right on mainor master work on their own branch which is published on the server. When their work is completed, a "lead architect" can easily look at these branches and, if good, merge them into mainor master.

RaimondB commented 7 months ago

If you use branches in Archi, you can already setup a very similar workflow: people with no right on mainor master work on their own branch which is published on the server. When their work is completed, a "lead architect" can easily look at these branches and, if good, merge them into mainor master.

Are there some good practices on how to set up branch protection rules and/or also do ownership for parts of the repository? I am looking for some mechanism where we can managed the reviews as a task (hence the PR). And I was hoping to do something with a CodeOwners file to assign review responsibility to different people based on the elements / diagrams that are impacted from a branch.

Phillipus commented 7 months ago

@RaimondB I'd like to keep this issue focussed on the main topic. I've just enabled Discussions, so you could start a discussion there, if you like.

RaimondB commented 7 months ago

Is it possible to support a PR workflow?

No, PR workflow is managed at Git Server level (such as GitHub or GitLab) and can't work with Archi as Archi absolutely has to manage all git operation or else the model will end up corrupted very quickly (coArchi includes an anti-corruption layer which fixes issues created by merge operations).

What about having a container based tool that can fix up the model as a github action? And validate model integrity as a test before allowing the merge? I have not yet look into all of the Archi CLI options, but it feels that it could work as long as we have a good pipeline setup to integrate the models and validate the result of it.

archimatetool / archi-modelrepository-plugin2

Handling Changes, Conflicts, and Merges #4