Open fleboulch opened 7 months ago
@fleboulch - first of all - great to see that this is of some use for you; thanks for the feedback
can you please add
webClient.cookieManager.clearCookies()
to your second case, because this is not part of the close process.
And can you please try HtmlUnit 3.11.0....
The issue here is even when I'm closing the webclient instance there is still memory which is not released. Here in my example code I'm dealing with a single source but in production I'm dealing with multiple sources.
I think there is a lot of things that are created and stored - but i think the point is: if you create a webClient several times and do some scraping, after closing the client the memory should go back to the level after the first round....
I would like to use 3.11.0 version but my suite test is failing since 3.10.0. I added a comment here
Yes you are correct! Even with a single webclient instance the memory is rising quite fast and in production I don't have a huge setup (1Go memory)
Second issue found when trying to migrate from 3.9.0 to 3.11.0 (comment)
Issue has been introduced in 3.10.0
Hello @rbri,
I'm seeing you are preparing a 4.0.0 version. That's a great news ! Did you have time to check the regressions I mentionned in my comments here?
I tried v4.0.0 and regressions I mentionned earlier disappeared!
Thanks for the amazing work @rbri :tada:
Nevertheless, I still have my base issue with memory leak (I tried some stuff you told me above but it's not working)
@fleboulch sorry for the long pause
The issue here is even when I'm closing the webclient instance there is still memory which is not released. Here in my example code I'm dealing with a single source but in production I'm dealing with multiple sources.
There are some internal (class based) caches that might be the reason.
I think a valid test scenario looks like this
So far the theory - will try to find some time to check the code again.
Thanks for your reply @rbri ! I really appreciate your deep investigation. I will try your scenario on my code to check if your assumptions are true. I'm using different webclients because at the beginning I was parallelizing the calls
I checked your comment and it seems correct!
On my app I need to scrap multiple sites/external sources and I don't need any cache mechanism (even more after a close). I'm scrapping these websites once a day and currently the memory used stays high.
What are your recommandations for my use case?
Hello @rbri, Do you have some news about this issue?
Hello,
I want to thank you for your amazing work. I'm using your lib since almost 1 year now and it's really nice.
I'm having an issue about memory (heap memory).
Showcase 1: I'm starting my app without doing any scrap
Heap: 74Mo
Showcase 2: I'm starting my app and doing 1 scrap with close
Heap: 256Mo
Showcase 3: I'm starting my app and doing 1 scrap with close + other clean + gc
Heap: 166Mo
The code is the same as the showcase 2 but only the finally clause is changing like below
The issue here is even when I'm closing the webclient instance there is still memory which is not released. Here in my example code I'm dealing with a single source but in production I'm dealing with multiple sources.
I also tried
.use
in Kotlin (try with resources) (article)Other info
3.9.0
(the behaviour is the same on older versions)Article read about the memory subject:
Similar issues
639