ZeligsoftDev / CX4CBDDS

CX4CBDDS component modeling and generation tool
Apache License 2.0
8 stars 5 forks source link

Internal error merging refs/heads/<branch> during model merging in Papyrus. #444

Closed emammoser closed 1 year ago

emammoser commented 1 year ago

Issue and tracking information

Developer's time Estimated effort to fix (hours):

Developer's Actual time spent on fix (hours)

Issue reporter to provide a detailed description of the issue in the space below

While verifying issue 415, we encountered a bug that occurs in both the performance fix and without the performance fix.

When attempting to perform a model merge from one branch onto another, sometimes this error message will appear:  image

Once this message appears, it is impossible to continue with the model merge. 

This issue is reproducible in the modelmergerproducer repo. First, pull that repo in a Papyrus workspace, switch to the "feature_branch" branch and attempt to merge the "conflicting_branch" branch onto "feature_branch" branch. This will always result in the above error being thrown. However, interestingly enough, if you perform the opposite and switch to "conflicting_branch" and attempt to merge the "feature_branch" branch onto the "conflicting_branch" branch, then the model merge continues as normal with no issue.

Here is the event details for the above error: image

eposse commented 1 year ago

Interesting. I must have tried only the direction that worked. I'll take a look.

eposse commented 1 year ago

I can reproduce it as well.

Looks like this error occurred even before merging the PR for Issue #415, so at least we know that that PR didn't introduce this error.

Looking at the exception trace, it seems like it is something in EMF Compare. I'll keep investigating.

eposse commented 1 year ago

And it looks like it may have something to do with elements from an template module instantiation, in particular with some element(s) in the instantiation of the Typed package/module from the DDS_DCSP connector library. I don't know if it's a problem specific to that library or in general with template instantiations.

Also, the exception occurs during merging doing some internal check on a reference change which involves one of those elements instantiated, and which happens to have a pathmap reference. I don't know yet if that might be the reason either but it may be relevant.

Nevertheless, it seems like EMF Compare shouldn't be trying to merge the library elements. But in this case it's attempting to merge the instantiated elements from the library. If these elements are never changed by the user, then maybe we should prevent EMF Compare from merging them. So the question is: do users change the elements of these template module instantiations such as DDS_DCPS::CCM_DDS::Typed (TestData_conn in the BasicPubSub model)?

emammoser commented 1 year ago

No, users almost never (that I can think of) modify instantiated templated modules after they have been instantiated. We will run the "reimplement" refactor feature from time to time, however. But I don't think we want to disable the merge feature for it in-case it ever becomes a use-case in the future.

eposse commented 1 year ago

Ok, I'll proceed with the assumption that we may want to merge template instantiations as well.

I found something that seems related.

When I checkout the master branch, which is the parent of the conflicting_branch but not the parent of the feature_branch, I get an exception shown below. Furthermore, as the trace shows, the problem seems to be when loading the DDS_Read element, which is in the TestData_conn template instantiation, and which is the container of the element that produces the merge exception (the element DDS_Read::status which is an InterfaceRealization of the PortStatusListener interface).

I also noticed that in the XMI of BasicPubSub.uml of that master branch, the 'client' attribute of the InterfaceRealization, has the id of the client duplicated, but that is not the case for neither the conflicting_branch nor the feature_branch nor the very first commit (the common parent to the master and feature_branch branches):

              <interfaceRealization xmi:type="uml:InterfaceRealization" xmi:id="_IK-WwAEWEe2YDp68O4JWJA" name="status" client="_IK3CAQEWEe2YDp68O4JWJA _IK3CAQEWEe2YDp68O4JWJA">
                <supplier xmi:type="uml:Interface" href="pathmap://DDS4CCM_DCPS_CONNECTOR_LIBRARIES/DDS_DCPS.uml#_h0qncGRnEd-Wc9hRqhM12g"/>
                <contract xmi:type="uml:Interface" href="pathmap://DDS4CCM_DCPS_CONNECTOR_LIBRARIES/DDS_DCPS.uml#_h0qncGRnEd-Wc9hRqhM12g"/>
              </interfaceRealization>

This suggests that the commit pointed to by the master branch may be corrupted. So the question is how did that client field get duplicated values.

Furthermore, I tried to cherry-pick the commit pointed to by the master branch on another branch I created, and got the same exception:

The exception trace:

Unable to load the resource with the uri platform:/resource/ModelMergeReproducer/ModelFiles/BasicPubSub.uml from the storage BasicPubSub.uml
!STACK 0
org.eclipse.emf.ecore.resource.Resource$IOWrappedException: Value 'org.eclipse.uml2.uml.internal.impl.ClassImpl@5a13ad0f (name: DDS_Read, visibility: <unset>) (isLeaf: false, isAbstract: false, isFinalSpecialization: false) (isActive: false)' is not legal. (platform:/resource/ModelMergeReproducer/ModelFiles/BasicPubSub.uml, -1, -1)
    at org.eclipse.emf.ecore.xmi.impl.XMLLoadImpl.handleErrors(XMLLoadImpl.java:77)
    at org.eclipse.emf.ecore.xmi.impl.XMLLoadImpl.load(XMLLoadImpl.java:185)
    at org.eclipse.emf.ecore.xmi.impl.XMLResourceImpl.doLoad(XMLResourceImpl.java:261)
    at org.eclipse.emf.ecore.resource.impl.ResourceImpl.load(ResourceImpl.java:1563)
    at org.eclipse.emf.compare.ide.internal.utils.NotLoadingResourceSet.loadFromStorage(NotLoadingResourceSet.java:254)
    at org.eclipse.emf.compare.ide.internal.utils.NotLoadingResourceSet.load(NotLoadingResourceSet.java:427)
    at org.eclipse.emf.compare.ide.internal.utils.NotLoadingResourceSet.load(NotLoadingResourceSet.java:394)
    at org.eclipse.emf.compare.ide.internal.utils.NotLoadingResourceSet.create(NotLoadingResourceSet.java:155)
    at org.eclipse.emf.compare.ide.ui.internal.logical.ComparisonScopeBuilder.createScope(ComparisonScopeBuilder.java:519)
    at org.eclipse.emf.compare.ide.ui.internal.logical.ComparisonScopeBuilder.create(ComparisonScopeBuilder.java:298)
    at org.eclipse.emf.compare.ide.ui.internal.logical.EMFResourceMappingMerger.mergeMapping(EMFResourceMappingMerger.java:261)
    at org.eclipse.emf.compare.ide.ui.internal.logical.EMFResourceMappingMerger.merge(EMFResourceMappingMerger.java:143)
    at org.eclipse.emf.compare.egit.internal.merge.RecursiveModelMerger$ModelMerge.run(RecursiveModelMerger.java:441)
    at org.eclipse.emf.compare.egit.internal.merge.RecursiveModelMerger$ModelMerge.access$1(RecursiveModelMerger.java:436)
    at org.eclipse.emf.compare.egit.internal.merge.RecursiveModelMerger.mergeTreeWalk(RecursiveModelMerger.java:249)
    at org.eclipse.jgit.merge.ResolveMerger.mergeTrees(ResolveMerger.java:1259)
    at org.eclipse.emf.compare.egit.internal.merge.RecursiveModelMerger.mergeTrees(RecursiveModelMerger.java:135)
    at org.eclipse.jgit.merge.ResolveMerger.mergeImpl(ResolveMerger.java:357)
    at org.eclipse.jgit.merge.Merger.merge(Merger.java:233)
    at org.eclipse.jgit.merge.Merger.merge(Merger.java:186)
    at org.eclipse.jgit.merge.ThreeWayMerger.merge(ThreeWayMerger.java:96)
    at org.eclipse.jgit.api.CherryPickCommand.call(CherryPickCommand.java:129)
    at org.eclipse.egit.core.op.CherryPickOperation$1.run(CherryPickOperation.java:106)
    at org.eclipse.core.internal.resources.Workspace.run(Workspace.java:2292)
    at org.eclipse.core.internal.resources.Workspace.run(Workspace.java:2317)
    at org.eclipse.egit.core.op.CherryPickOperation.execute(CherryPickOperation.java:118)
    at org.eclipse.egit.ui.internal.commit.command.CherryPickUI$1.performJob(CherryPickUI.java:147)
    at org.eclipse.egit.ui.internal.jobs.RepositoryJob.run(RepositoryJob.java:59)
    at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63)
Caused by: org.eclipse.emf.ecore.xmi.IllegalValueException: Value 'org.eclipse.uml2.uml.internal.impl.ClassImpl@5a13ad0f (name: DDS_Read, visibility: <unset>) (isLeaf: false, isAbstract: false, isFinalSpecialization: false) (isActive: false)' is not legal. (platform:/resource/ModelMergeReproducer/ModelFiles/BasicPubSub.uml, -1, -1)
    at org.eclipse.emf.ecore.xmi.impl.XMLHandler.setFeatureValue(XMLHandler.java:2715)
    at org.eclipse.emf.ecore.xmi.impl.XMLHandler.handleForwardReferences(XMLHandler.java:1193)
    at org.eclipse.emf.ecore.xmi.impl.XMLHandler.endDocument(XMLHandler.java:1282)
    at org.eclipse.uml2.uml.internal.resource.UMLHandler.endDocument(UMLHandler.java:57)
    at org.eclipse.emf.compare.ide.internal.utils.ForwardingXMLDefaultHandler.endDocument(ForwardingXMLDefaultHandler.java:67)
    at org.eclipse.emf.compare.ide.internal.utils.NotifyingParserPool$NamespaceDeclarationNotifyingXMLDefaultHandler.endDocument(NotifyingParserPool.java:217)
    at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endDocument(AbstractSAXParser.java:746)
    at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:539)
    at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:888)
    at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:824)
    at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
    at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1216)
    at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:635)
    at java.xml/com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:324)
    at org.eclipse.emf.ecore.xmi.impl.XMLLoadImpl.load(XMLLoadImpl.java:175)
    ... 27 more
Caused by: java.lang.IndexOutOfBoundsException: targetIndex=1, size=1
    at org.eclipse.emf.common.util.BasicEList.move(BasicEList.java:657)
    at org.eclipse.emf.common.notify.impl.NotifyingListImpl.doMove(NotifyingListImpl.java:1338)
    at org.eclipse.emf.common.notify.impl.NotifyingListImpl.move(NotifyingListImpl.java:1311)
    at org.eclipse.emf.ecore.xmi.impl.XMLHelperImpl.setValue(XMLHelperImpl.java:1189)
    at org.eclipse.emf.compare.ide.internal.utils.ForwardingXMLHelper.setValue(ForwardingXMLHelper.java:205)
    at org.eclipse.emf.compare.ide.internal.utils.NotifyingParserPool$NotifyingXMLHelper.setValue(NotifyingParserPool.java:298)
    at org.eclipse.emf.ecore.xmi.impl.XMLHandler.setFeatureValue(XMLHandler.java:2710)
    ... 41 more
emammoser commented 1 year ago

I have always had a sneaking suspicion that some of our models were partially corrupted after the migration from RSA. What the cause of this is, I don't know, but we have always seen strange behavior with our models that we can't reproduce, so we tend to not raise the issues with your team. For the most part CX and Papyrus do not have an issue with these models, but us users have no idea that something might be malformed inside the model. I would love a model validation extension that would do this sort of extra introspection.

Have you tried manually removing the extra ID specification to see if that fixes your exception?

eposse commented 1 year ago

I have tried it and indeed, that is the culprit. It turns out there where several duplicated ids in the 'client' attribute of the InterfaceRealization of several of the instantiated template elements: DDS_Read, DDS_Get, DDS_Listen and DDS_StateListen.

After removing the duplicated ids, I amended the commit for 'master' creating a new 'master1' branch, and then cherry-picked the commits for 'feature_branch' and 'conflicing_branch' into branches 'f1' and 'c1' respectively on top of the new 'master1'. (See the attached updated git repo.) Then I tried merging both ways, 'c1' into 'f1' and 'f1' into 'c1' and both work.

So we can conclude that the cause of the problem is indeed those duplicate ids. The question is how did they get duplicated? Your suggestion of this being an artifact from the RSA migration is a possibility, except that in this case, the first commit in the repo, the common parent to 'master' and 'feature_branch' does not have these duplicate ids. So somehow between the first commit and the commit for 'master', these ids were introduced. The author of these commits was David Ashenden. Perhaps he might remember what he might have done? I imagine he might not, as the repo was made 9 months ago, but perhaps he might remember if something odd happened when he did that commit.

In the meantime, my working theory is that it may have something to do with template instantiation, since the commit for 'master' added a new template instantiation for a DataSpace "NewMessage_conn", and it is precisely the elements inside this package that have the duplicate ids. I'll do some tests to see if template instantiation causes these duplicates.

As for validation, indeed, that would be ideal. Our meta-model already has some validation built-in, but surely this particular case was missing. I'm going to check if I can find validation for these duplicates. If the problem is produced by template instantiation, then it might make sense to run validation after it, and in fact after any operation modifying the model. I believe that some of this validation is done "live", but perhaps not all of it, as it may be too costly.

modelmergereproducer2.zip

eposse commented 1 year ago

Well, I've been trying unsuccessfully several things to reproduce the duplicated client ids, including:

Unfortunately I still can't find any action that would result in the duplicated ids. I'll keep trying.

I've also done a quick inspection of the code that manipulates InterfaceRealizations and nothing has come up yet. But perhaps it's something very subtle.

emammoser commented 1 year ago

I went to look on my end to see if the BPS model in any of our configured branches from RSA, older version of papyrus, most up to date version of papyurs, and none of them have this issue. I even check all of our current production models and couldn't find a single instance of this happening. That being said I believe I have seen this before a while back. I have no idea how it occurred there either. Is there any way you can write a validation step to ensure attributes that should only specify a single ID do? I know some attributes like nestedPart require two IDs, but other than those, we could check for this type of corruption in the future.

eposse commented 1 year ago

I think I can add some validation for it.

I'll also continue to see if I can figure out how it could have been duplicated.

eposse commented 1 year ago

Just to give you an update: validation turns out to be trickier than expected. It turns out that when the model is loaded, the duplicated id is removed, so the model in memory has only one reference to the target element, and therefore normal validation fails to detect the problem.

I suppose that somehow, when Git operations such as checkout and merge are performed, the model is loaded in a different way that preserved the duplicated ids causing the error. I'm looking into that.

emammoser commented 1 year ago

Thanks for the update

eposse commented 1 year ago

A new update. It seems like the duplicate ids were just a red herring and not the real cause of the exception.

The duplicated ids do indeed result in an exception when opening the model and performing certain EGit operations like checkout and cherry-picking. But my original attempt to reproduce without the duplicate ids was flawed. A more accurate reproduction without duplicate ids still yields the original exception. It does happen with the same elements in the model, which is why I thought it was the duplicate ids, but EMF Compare has a problem with a different feature, not the one with duplicate ids. It does seem to be a problem with EMF Compare itself, so I'm digging in deeply trying to understand how it works exactly.

Furthermore, while the duplicate ids raise an exception, the model does seem to be correctly loaded in memory. So validation for duplicate ids doesn't seem to be that important now, at least for this issue. Besides, with the model correctly loaded in memory, the only way to detect such duplicated ids is to do it outside of the validation framework. So for now, I think it's more important to focus on the real cause of the original exception, and leave the potential validation for later.

eposse commented 1 year ago

An update: the problem is related to InterfaceRealization elements whose supplier Interface is external (a pathmap reference to a library element) and which exist in the branch being merged but not in the receiving branch.

I still do not yet understand why this is the case as it goes deeply into the inner workings of EMF Compare, but I was able to create a minimalistic git repo that reproduces the error. See the attached zip file.

In this repo there is a model with a struct S1, a locally defined interface I1 and a port type PT1. Then there are two sets of branches: "internal" and "external". In the "internal" ones, the interface realization of the "provides" interface of PT1 is the local interface I1. In the "external" branches, the interface realization is "PortStatusListener" from the DDS_DCSP library (and therefore an external pathmap reference). Then for each set, there are two conflicting branches "-a" and "-b". The conflict is on the type of the attribute in the S1 struct. Additionally the "*-b" branches add a new port type PT2, and as before, its interface realization is I1 in the "internal-b" branch and PortStatusListener in "external-b" branch.

When merging "internal-a" onto "internal-b" it works as expected. When merging "internal-b" onto "internal-a" it works as expected. When merging "external-a" onto "external-b" it works as expected. but when merging "external-b" onto "external-a" we get the exception.

So this clearly rules out a problem with interface realizations with references to local interfaces.

It's not clear why the situation is asymmetric. But at least this should simplify the process of diagnosing the problem.

issue444min.zip

emammoser commented 1 year ago

Thank you for the update. I am glad you were able to find a much simpler reproducer

eposse commented 1 year ago

A new update: It has been very difficult to track down the cause, but I think I may have discovered it, or at least part of it. I've found a fundamental asymmetry embedded deep inside the core of EMF Compare, during a phase called "premerge", which seems to behave differently depending on the direction of the merge. I think it may be a fundamental bug in EMF Compare itself, but I need to investigate and test further to confirm this.

I'll provide more details on Monday.

eposse commented 1 year ago

More details:

I do not know why these Match objects are not created. I have asked in the EMF Compare forum here, but I'm not sure how quickly they might respond. The EMF Compare project doesn't look very active. Their last commit was 9 months ago, and they don't answer questions very often in the forum. This could be a big problem because the places where this exception happens seems to be in the very core of EMF Compare, not a part that might be customized, which means that we might be forced to fork it. But even then, I'm not sure about what the solution would be. So my plan is to try a fork and put at least some null value checks on that function to see if it works.

emammoser commented 1 year ago

Thank your for your update. We really appreciate your persistence on this issue.

eposse commented 1 year ago

By the way, under Preferences -> EMF Compare -> Merge there is a "Pre-merge" checkbox. If one unchecks it, premerge is not performed and there is no exception, at first. This is, the comparison + conflict detection will succeed and the user will be able to open the Merge Tool and see the differences. However, once you make the desired changes to resolve the conflict and try to save, the actual merge will be executed leading to the same exception. Therefore, unchecking the "Pre-merge" checkbox is not a real workaround, and the underlying problem remains.

eposse commented 1 year ago

And more info: when merging "external-a" to "external-b" (the one that works, where the new elements are on "external-b" already), after the comparison, and before the pre-merge, the list of differences is empty! This results in the pre-merge being skipped, which is why there is no exception in that case. This looks like another unexplained asymmetry, most likely a bug.

eposse commented 1 year ago

I've made some modifications to the method that causes the exception and it succeeds, but there are two caveats:

  1. My solution works when there is at most one element in the list that's being checked, which in this case is the list of "suppliers" of the InterfaceRealization. I need to test whether adding more suppliers will work, but I doubt it. Nevertheless, that shouldn't be a problem for this case, as typically an InterfaceRealization has only one Interface. But I don't know if there are other cases.
  2. The solution requires an update to a core part of EMF Compare, which may not be customizable, so we may be forced to create a fork of EMF Compare. I'm investigating if we can avoid the fork.
eposse commented 1 year ago

It looks like it is possible to override the problematic code without having to fork EMF Compare after all. I've opened a PR with the proposed solution, but as mentioned in my previous comment, it may not work in some cases, so I'm testing to see what happens in those cases.

eposse commented 1 year ago

After some testing, the case for multiple suppliers seems to work, but I'm never hitting the code that I think could cause trouble.

Nevertheless, you can test the build with the proposed solution which you can get here.

eposse commented 1 year ago

I've continued to test it and it seems to work in all cases I've tried. I still cannot reproduce a case that hits a potentially problematic piece of the code, and it looks like that code deals only with some corner cases, not typical cases, so, since the solution works for all cases that we have encountered, if it works for you, we could merge this solution and deal with the remaining corner cases in another issue, or we can keep this issue open and I'll continue to investigate when I get back on July 10th. Paul has reviewed the solution and is aware of the corner cases. He can merge this one if you'd like to. Let us know.

emammoser commented 1 year ago

Have you attempted to use the proposed solution with our original modelmergereproducer example? When I use it, it no longer throws an error, which is great. What is strange is that I do get a conflict, and attempting to resolve the conflict has an extra step I wouldn't expect. The conflict is on the remove "color" enumeration. Are you seeing the same thing?

eposse commented 1 year ago

Hi. I have, and I see the conflict, and in fact I get the same conflict in both directions. The conflict was expected because the feature_branch added, in the 'defn' package, the 'ComplicatedTestData' struct which has a 'color' attribute of type Color, which is the Enum deleted in the conflicting_branch.

emammoser commented 1 year ago

Ah, that makes some sense. I guess that is a tough thing for a modeling merging tool to understand. If we remove color in one spot but use it in another, how do we resolve that conflict easily.

In any case, the issue that this ticket was to resolve appears to be fixed, so I am happy with merging and closing this ticket.

eposse commented 1 year ago

Great! Should I merge it both to the papyrus branch and the maintenance branch?

emammoser commented 1 year ago

Yes, I think merging it to both would be best for us. Thanks!

eposse commented 1 year ago

The commit has been cherry-picked onto the streams/v2.4.x-maintenance branch and merged onto the papyrus branch.