eclipse-aspectj / aspectj

Other
303 stars 86 forks source link

Synchronise JoinPointImpl methods dealing with Stack<AroundClosure> #129

Closed kriegaex closed 2 years ago

kriegaex commented 2 years ago

If we do not synchronise, we will run into problems with nested around-aspects in combination with proceeding asynchronously (in another thread).

Fixes #128.

Work in progress, because it needs

kriegaex commented 2 years ago

@aclement, WDYT about synchronising those 4 methods completely. Does that have to be any more fine-granular in your opinion due to performance considerations?

Continuing the manual tests from https://github.com/eclipse/org.aspectj/issues/128#issuecomment-1051500942, running the same 4 scenarios as in 1.9.2, i.e.

  1. outer around-aspect proceeds 1x, inner (asynchronous) around-aspect proceeds 1x,
  2. outer around-aspect proceeds 2x, inner (asynchronous) around-aspect proceeds 1x,
  3. outer around-aspect proceeds 1x, inner (asynchronous) around-aspect proceeds 2x,
  4. outer around-aspect proceeds 2x, inner (asynchronous) around-aspect proceeds 2x,

we see the following logs after commit https://github.com/eclipse/org.aspectj/pull/129/commits/a3e3680b1bafbfe86c115bf3d42b8e871954065a:

blue - AA before time = 1645845533286
blue - BB before time = 1645845533288
blue - BB after time = 1645845533290
blue - AA after time = 1645845533290
blue - BB 2
java.util.EmptyStackException
        at java.base/java.util.Stack.peek(Stack.java:101)
        at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:170)
        at com.example.myapplication.BBAspect$1.run(BBAspect.aj:22)
        at java.base/java.lang.Thread.run(Thread.java:833)
blue - AA before time = 1645845580276
blue - BB before time = 1645845580278
blue - BB after time = 1645845580279
blue - BB before time = 1645845580279
blue - BB 2
blue - BB after time = 1645845580279
blue - BB 2
blue - testMethod - called by com.example.myapplication.BBAspect$1.run(BBAspect.aj:22)
blue - testMethod - called by com.example.myapplication.BBAspect$1.run(BBAspect.aj:22)
blue - AA after time = 1645845580280
blue - AA before time = 1645845609209
blue - BB before time = 1645845609211
blue - BB after time = 1645845609213
blue - AA after time = 1645845609213
blue - BB 2
java.util.EmptyStackException
        at java.base/java.util.Stack.peek(Stack.java:101)
        at org.aspectj.runtime.reflect.JoinPointImpl.proceed(JoinPointImpl.java:170)
        at com.example.myapplication.BBAspect$1.run(BBAspect.aj:21)
        at java.base/java.lang.Thread.run(Thread.java:833)
blue - AA before time = 1645845647825
blue - BB before time = 1645845647828
blue - BB after time = 1645845647829
blue - BB before time = 1645845647829
blue - BB 2
blue - BB after time = 1645845647829
blue - BB 2
blue - testMethod - called by com.example.myapplication.BBAspect$1.run(BBAspect.aj:21)
blue - testMethod - called by com.example.myapplication.BBAspect$1.run(BBAspect.aj:22)
blue - testMethod - called by com.example.myapplication.BBAspect$1.run(BBAspect.aj:21)
blue - testMethod - called by com.example.myapplication.BBAspect$1.run(BBAspect.aj:22)
blue - AA after time = 1645845647830

Before committing the changes, I had only tested scenario 4, falsely thinking that if that one works, the others would too. That tells us that the synchronisation (alone) does not solve the problem. That this build passed also tells us that we have insufficient test coverage. I am going to inspect the problem further.

kriegaex commented 2 years ago

I think I am beginning to understand the situation better now. I think the stack exhaustion problem tells us:

Local tests for all 4 scenarios after commit https://github.com/eclipse/org.aspectj/pull/129/commits/1584914314aa7ad69e08fa295984df784e204d1f are looking good, and the log output is exactly what I would expect in each case:

blue - AA before time = 1645853064979
blue - BB before time = 1645853064980
blue - BB after time = 1645853064982
blue - AA after time = 1645853064982
blue - BB 2
blue - testMethod - called by com.example.myapplication.BBAspect$1.run(BBAspect.aj:22)
blue - AA before time = 1645853078965
blue - BB before time = 1645853078967
blue - BB after time = 1645853078969
blue - BB before time = 1645853078969
blue - BB 2
blue - BB after time = 1645853078969
blue - BB 2
blue - testMethod - called by com.example.myapplication.BBAspect$1.run(BBAspect.aj:22)
blue - testMethod - called by com.example.myapplication.BBAspect$1.run(BBAspect.aj:22)
blue - AA after time = 1645853078970
blue - AA before time = 1645853092967
blue - BB before time = 1645853092969
blue - BB after time = 1645853092970
blue - AA after time = 1645853092970
blue - BB 2
blue - testMethod - called by com.example.myapplication.BBAspect$1.run(BBAspect.aj:21)
blue - testMethod - called by com.example.myapplication.BBAspect$1.run(BBAspect.aj:22)
blue - AA before time = 1645853119151
blue - BB before time = 1645853119153
blue - BB after time = 1645853119154
blue - BB before time = 1645853119155
blue - BB after time = 1645853119155
blue - AA after time = 1645853119155
blue - BB 2
blue - testMethod - called by com.example.myapplication.BBAspect$1.run(BBAspect.aj:21)
blue - BB 2
blue - testMethod - called by com.example.myapplication.BBAspect$1.run(BBAspect.aj:22)
blue - testMethod - called by com.example.myapplication.BBAspect$1.run(BBAspect.aj:21)
blue - testMethod - called by com.example.myapplication.BBAspect$1.run(BBAspect.aj:22)

But I have not analysed yet if this all works in combination with before/after advice types. As a first indicator, let us wait for the GitHub CI build to finish.


Update: OK, the build failed, so I messed something up. Let me check...

aclement commented 2 years ago

I haven't read it all thoroughly - it will take me a long while to get back in the zone with that code. So I will leave my immediate early thoughts:

Now both of those things are based on how JVMs were when it was written a hundred years ago and it may be, like the test that measures the benefit of inlining around advice, it doesn't matter anymore due to JVM efficiency.

aclement commented 2 years ago

It isn't quite clear to me if you removed the 'no stack' variant because you had to or because you were trying to introduce consistency and just want to remove the optimization it was attempting?

kriegaex commented 2 years ago

I would not even have touched that code, had I thought that you might find time to tackle it. But I am curious and it is a good challenge to try and understand it a bit better. Like I said, the code is not meant to be merged yet and maybe never will. I created the PR mainly because I want the CI build results and your feedback. I have local variants of it with and without synchronisation, with and without stack. Once we found a solution covering both the old test cases and my new ones (not incorporated into AspectJ yet), we can revert or optimise whatever didn't contribute to the fix. Today I am blocked by a maintenance power outage in my area for several hours. If in the meantime you feel like taking a shot, I would be happy. Otherwise I continue playing when I get around to it next time. I just wish someone would give me a brief introduction to the interplay of classes there. They are not well documented, and the code is not quite self-explanatory.

kriegaex commented 2 years ago

It isn't quite clear to me if you removed the 'no stack' variant because you had to or because you were trying to introduce consistency and just want to remove the optimization it was attempting?

I just wanted to see if removing it would make more of my tests pass and reduce complexity. Optimisations are often a source of corner case problems. If we find out that there is no problem, we can always revert. I am just poking at code I do not fully understand with sticks and see what happens. I am creating data here, the test logs are quite insightful. Either it helps me fix the problem in the end, or maybe it helps you fix it faster than when starting from zero. Your time for AspectJ is more limited than mine, otherwise the only sensible approach would be pair programming in order to maximise learning and yield a second person who can maintain that code in the future. But I guess, the pair programming ain't a-gonna happen, so I am experimenting by myself in public, documenting my steps, at the danger of embarrassing myself. But I am not afraid of that, if I can learn something or at least help you a tiny bit.

kriegaex commented 2 years ago

Sometimes just doing nothing for a day is helpful. This morning, I woke up and the answer just came to me - or so I am thinking. I am going to try something in a new branch. If that works, I am going to close this PR and open a new one. Stay tuned. Either way, the experiments here served a purpose. I learned something by doing and I got your valuable feedback, @aclement.

kriegaex commented 2 years ago

Superseeded by #132, therefore closing here.