clj-commons / aleph

Asynchronous streaming communication for Clojure - web server, web client, and raw TCP/UDP
http://aleph.io
MIT License
2.54k stars 241 forks source link

Add more reliable resource leak detector for tests #708

Closed DerGuteMoritz closed 3 months ago

DerGuteMoritz commented 4 months ago

See docstrings in aleph.resource-leak-detector for details.

This patch also includes the following changes:

Ideally we'd activate it in the CI test job but we first need to fix the leak in aleph.http-test/test-idle-timeout for that (see #707).

Based on the research results from #615.

DerGuteMoritz commented 4 months ago

Hmm aleph.http-timeout-test/test-shutdown-timeout-1 fails consistently in CI for me while it works fine locally. Looks like there is a race condition there where the client has not yet established a connection before the server shuts down after the 50ms wait? I can't think of a way to make this more reliable, though... @pyr Since you are the author, do you have an idea?

FWIW I also pushed a branch with only a bogus change to trigger the job and this one also fails so it's not related to any change in this here patch.

pyr commented 4 months ago

@DerGuteMoritz Nothing jumps at me, works locally here as well

KingMob commented 4 months ago

Do you want to reevaluate after merging #709 ?

DerGuteMoritz commented 4 months ago

Well, that fixed it but now the DNS failure tests started timing out, as well (also on master). See https://github.com/clj-commons/aleph/pull/710 for an improvement.

DerGuteMoritz commented 4 months ago

OK all green now :sweat_drops: Please give it a try!

DerGuteMoritz commented 4 months ago

I'm not sure it will fly long-term (the strategy around calling the GC - since it's just a hint. And manually running the finalizations).

@arnaudgeiser Now that you mention it: running finalizers should actually not be needed, I only threw runFinalization in there for good measure. Netty does use a handful of finalizers but the leak detector uses weak references and a reference queue instead. I think leaving it in there for now doesn't hurt for now but having to remove it shouldn't be a problem either - if it ever gets removed completely, Netty will have to adjust as well, anyway. And by then Netty 5 migh thave rolled around which AFAIK gets rid of reference counting anyway :shrug:

But it improves the current situation and it's just for testing purpose. Let's see what will be the future.

Right, improving the current testing situation is the main goal here :+1:

Thanks for the approach!

You're welcome and thanks for your input on the original ticket :pray:

https://openjdk.org/jeps/421

Ah, good to know, wasn't aware of this!