eclipse-aspectj / aspectj

Other
272 stars 82 forks source link

Fix potential ThreadLocalMap.Entry accumulation in JoinPointImpl #303

Closed KimmingLau closed 2 months ago

KimmingLau commented 3 months ago

As Entry is WeakReference, when the JoinPointImpl object were collected by GC, the Entry would still be referenced by ThreadLocalMap of the thread until memory pressure. So we must proactively remove ThreadLocal when arcIndex is back to -1.

Fixes #302.

kriegaex commented 3 months ago

In #302, you mentioned that you want to prepare a reproducer. I think, it would still be good to have one which then we can turn into a test like the previous one.

KimmingLau commented 3 months ago

Yes, I actually want to make a reproducer, but I don't know how to reproduce in code style that the GC would be more frequently without this fix. Could you give some suggestions?

kriegaex commented 3 months ago

Maybe there is no need to check GC frequency but rather memory consumption, more exactly the number of weakly referenced thread-locals.

kriegaex commented 2 months ago

@KimmingLau, I took the liberty of force-pushing to your PR branch, because I

If you have changed anything locally in the meantime, please rebase on top of this, thanks.

Maybe I have an idea for an integration test, but am currently busy with another issue affecting OSGi weaving in Eclipse when using AJDT.

KimmingLau commented 2 months ago

Thank you, I think I have no more commits to push.

kriegaex commented 2 months ago

@KimmingLau, I am having problems reproducing this. I need to compare before and after the change you suggested to see any effect. I tried with 1.9.22, iterating over Thread.getAllStackTraces().keySet() and for each item checking for Thread.threadLocals, but I cannot find any non-null values. Please explain how to reproduce this. I used CTW with -Xnoinline and two nested around advices. Neither via code nor via memory dump can I find any non-null thread-locals, not even without GC. Maybe I am overlooking something. Do I need any special Java version or JVM parameters? I tried on JDKs 22 and 8.

kriegaex commented 2 months ago

@KimmingLau, I am having problems reproducing this

Forget about it, I had a simple typo in my pointcut. I can reproduce it now.

KimmingLau commented 2 months ago

aspectj-threadLocal-leak-reproduer.zip

I'm sorry that I couldn't submit the reproducer in time due to my busy work.
Using the above reproducer, if I ran it with JVM options -javaagent:aspectjweaver-1.9.21.2.jar -Xms512m -Xmx512m -XX:+UseParNewGC -XX:+UseConcMarkSweepGC, it would print threadLocalMapSize = 10004

if I ran it with the same options but other than using aspectjweaver-1.9.23-SNAPSHOT.jar which is with the fix, it would print threadLocalMapSize = 4

KimmingLau commented 2 months ago

Moreover, if you change the limited loop in the above code to a continuous loop to simulate the continuous traffic in a real production environment, and then use jconsole to observe the memory usage of CMS old gen, you will find that the growth rate will be faster when using aspectjweaver-1.9.21.2.jar than the fixed one.

kriegaex commented 2 months ago

Thanks, but I have my own reproducer now. I wish I had had yours before. Now we both spent time and work on this. Sorry for that. I only started creating a reproducer, because previously you said:

I think I have no more commits to push.

KimmingLau commented 2 months ago

Now we both spent time and work on this. Sorry for that

It's my pleasure to spend time on that. But last night I reivewed my code change again, I'm afraid the current approach is not perfect.
Assume that the thread currently executing the around code is thread A. If we put the proceed() method on another thread B to execute, when the entire call ends, we only remove the ThreadLocal of thread A, but not the thread B, because there is only Thread A executed the unlink() method. I don’t know if you can get my point.

kriegaex commented 2 months ago

I just force-pushed your PR with an IT preceding the commit with the fix. It reproduces the problem before the fix. Feel free to comment on it.

kriegaex commented 2 months ago

But last night I reivewed my code change again, I'm afraid the current approach is not perfect.

I am fully aware of that, which is why in https://github.com/eclipse-aspectj/aspectj/issues/302#issuecomment-2034373077 I wrote:

The problem with releasing the thread-locals is that a live thread could in principle defer proceeding indefinitely or not proceed at all. AspectJ has no way of knowing.

Maybe there is a way to at least remove the thread-locals in cases where the last proceed has happened already, i.e. the index is back down to zero.

The first paragraph is pointing exactly to the problem you just noticed. There is no perfect solution other than switching to native syntax. If not all threads fully proceed, there will always be leftover thread-locals.

KimmingLau commented 2 months ago

I got it, thank you for your detailed explanation.

kriegaex commented 2 months ago

Some more clarification on my previous statement: Remember that it is in principle possible to proceed multiple times from the same or multiple threads. This is the reason we have thread-locals in the first place, otherwise we would not need them. I.e., if someone had the idea to remove all thread-locals for the same JoinPointImpl instance after one of the threads reached index -1, it would be wrong to do so.

KimmingLau commented 2 months ago

Hi @kriegaex, I would like to ask here, when will 1.9.22.1 be released?

kriegaex commented 2 months ago

@KimmingLau, technically speaking, in principle 1.9.22.1 could be released anytime, if it was not for the tedious steps around releases. My personal situation is, that on top of my work I also have quite a few private issues on my plate, which is why I have not done much for AspectJ since 1.9.22, which ate up a lot of time, because I also had to adjust stuff in AJDT and the M2E connector for AJDT, both of which do not have any active maintainers. Therefore, after that was done I have shifted my focus to other things that need to be taken care of.

Today is a holiday and great weather. Let me think about it for a minute...

kriegaex commented 1 month ago

@KimmingLau, AspectJ 1.9.22.1 has been released today.