Closed meks77 closed 1 month ago
First and foremost, thanks a ton @meks77 for putting the amount of effort in to this that you have. It is not every day that we receive a PR to fix something so deep down into the core of Axon Framework, so that deserves applause if you ask me.
Now, I do have some questions before we proceed with approving/rejecting this PR. First and foremost, I want to ensure the code path that led to this scenario in your environment.
As you point out, the problem occurred when applying events; I am fairly confident the issue occurred as part of either (1) the ForwardMatchingInstances#filterCandidates
method or (2) the AbstractChildEntityDefinition#extractCommandHandlerRoutingKey
operation.
Assuming the issue occurs around the ForwardMatchingInstances
implementation, I would like to validate the following statement:
@AggregateIdentifier
annotated field/method has a different name than the field representing the aggregate identifier in your event or the event does not carry the aggregate identifier in its payload.If this is an incorrect assumption, I would anticipate the predicament to arise from the AbstractChildEntityDefinition
.
Assuming this is the case, I want to validate a different statement:
@EntityId
annotated field of the Aggregate Member does not have the same name as any of the fields in the command being handled.So, if you could shed some light on the above for me, @meks77, I would be better suited to decide the path forward.
By the way, concerning the test case you've provided. I need to quadruple the time window for my laptop already, sadly enough.
As we run automated tests through GitHub Actions, which are even more limited (compared to my machine), I am afraid we would need to remove the @Test
annotation.
The test on it's own isn't wrong, but having it run automatically will cause the build to become flaky due to the machines that back GitHub Actions.
Hi Steven!
Many thanks for the praise ;)
In the stacks I can see ForwardMatchingInstances and AbstractChildEntityDefinition. Also I could see that both are used to apply the event, but also for snapshots.
Also I have copied the stacks:
Furthermore I can tell you that the command contains the aggregate Id with the same name as the aggregate. The events contain the entity id of a member.
I you have more questions don't hesitate to contact me.
best regards
Markus
Thanks for the additional details, @meks77!
It makes sense to me that the snapshot-path also triggers these issues.
Whenever a snapshot's generated, all the events are loaded again to source the model for the sake of constructing the snapshot.
Hence, the forwarding-instance-logic that triggers the PropertyAccessStrategy
is invoked again.
Looking into the forwarding-instance-logic, I do have a follow-up question:
routingKey
field of the @AggregateMember
annotation? If so, what is the value?@EntityId
annotation on a field or method inside the @AggregateMember
annotated object or value in case the aggregate members are part of a collection? If so, is the routingKey
value set for this annotation?Although I am mainly curious to make certain I understand how you're faced with this issue, I do think we can proceed with your PR too. For that, I'll place a review with some comments to look in to.
Thanks for the additional details, @meks77!
It makes sense to me that the snapshot-path also triggers these issues. Whenever a snapshot's generated, all the events are loaded again to source the model for the sake of constructing the snapshot. Hence, the forwarding-instance-logic that triggers the
PropertyAccessStrategy
is invoked again.Looking into the forwarding-instance-logic, I do have a follow-up question:
* Are you using the `routingKey` field of the `@AggregateMember` annotation? If so, what is the value? * Are you using the `@EntityId` annotation on a field or method inside the `@AggregateMember` annotated object or value in case the aggregate members are part of a collection? If so, is the `routingKey` value set for this annotation?
Although I am mainly curious to make certain I understand how you're faced with this issue, I do think we can proceed with your PR too. For that, I'll place a review with some comments to look in to.
We do not use the routingKey. The thrown event contains a property with the id of an entity, which is a AggregateMember. The aggregate hold a list of these members. The AggregateMember has a property annotated with EntityId withoud routingKey.
The Command contains the AggregateId. The aggregate has the handler method for the command and applies some events for the with the entity id of AggregateMember.
The annotated list with the aggregate members is annotated with AggregateMember using the eventForwardingMode ForwardMatchingInstances.
dont't hesitate to ask if you have further questions
I've left a couple of comments to look into for this PR. One pointer isn't about the code, but about the branch you want to merge it in.
As I see this as a bug fix, I would like to ask you to base the PR on the
axon-4.10.x
branch. By doing so, it will more easily become a part of the 4.10.1 release that should come in the near future.Chances are that you need to recreate the PR to have that done cleanly. But I trust you can figure that out, @meks77.
I created a the new merge request #3117. Everything you mentioned is already considered. Therefore I close this merge request.
In our project we are applying around 1,9 billion events in around 21 hours. This is a migration of an old system, which happens weekly, as long as we are in developing the product. At least one additional year. In this migration only events are applied.
While applying events, I was wondering why the client uses 50 % of the CPU, while the Axon Server only needs around 5 %. After profiling the client, I realized that about 65 % of the CPU Time was used for throwing exceptions while applying events.
This should be a fix, which decrease the CPU time and the duration of a property access by around 65 %.
One debug log message was removed in the case if the accessor method returns void. It is already filtered before, when getting the method. I hope the log message wasn't that important.
Concerning the test method for the performance: I don't know your strategy for performance and also not about the build environment. The test works on my machine. Feel free to remove or adapt it to your needs.