ControlSystemStudio / phoebus

A framework and set of tools to monitor and operate large scale control systems, such as the ones in the accelerator community.
http://phoebus.org/
Eclipse Public License 1.0
90 stars 90 forks source link

Memory leak issue!! #2054

Open joshikk opened 2 years ago

joshikk commented 2 years ago

Hello, I am using the following commit of phoebus : 27a925d0dd79c0b497c20fb176c097641d581e93 and JDK 16 I am noticing on starting phoebus the Virtual Memory counts up to max and complete memory is consumed. This is on starting a trivial panel without any script. I am suspecting a memory leak. Is there any way of identifying the module where the leak is occuring?

Thank you in advance.

Regards, Kuldeep

kasemir commented 2 years ago

What operating system? Linux? Are you truly running out of memory, or is just the Linux VIRT memory high?

Use VisualVM to monitor memory usage https://visualvm.github.io Slightly better is jProfiler, it can show where the memory is actually allocated, https://www.ej-technologies.com/products/jprofiler/overview.html, but while VisualVM is free, jProfiler needs to be bought.

Try setting MALLOC_ARENA_MAX, see https://stackoverflow.com/questions/561245/virtual-memory-usage-from-java-under-linux-too-much-memory-used

joshikk commented 2 years ago

Hello, I am using Ubuntu Focal with openjdk 16.0.1 2021-04-20, both virtual and physical memory are getting consumed. After some time the PC becomes exceedingly slow due to this. OpenJDK Runtime Environment (build 16.0.1+9-Ubuntu-120.04) OpenJDK 64-Bit Server VM (build 16.0.1+9-Ubuntu-120.04, mixed mode, sharing)

I have tried the following settings before starting phoebus, they are not helping, the consumption is still growing

export MALLOC_ARENA_MAX=2 export MALLOC_MMAPTHRESHOLD=131072 export MALLOC_TRIMTHRESHOLD=131072 export MALLOC_TOPPAD=131072 export MALLOC_MMAPMAX=65536

I am attaching the screenshot of visualvm for monitoring thread memory

It seems that the thread RMI TCP Connection(2)-127.0.0.1 memory consumption is counting up.

I am using the following parameters in settings.ini org.phoebus.pv.ca/addr_list=127.0.0.1 11.123.15.117 org.csstudio.trends.databrowser3/urls=jdbc:postgresql://localhost/archive|RDBjdbc:postgresql:// 11.123.15.117/archive|RemoteRDB org.csstudio.trends.databrowser3/archives=jdbc:postgresql://localhost/archive|RDBjdbc:postgresql:// 11.123.15.117/archive|RemoteRDB

Setting for the Database url used in the archive engine

org.csstudio.archive/url=jdbc:postgresql://localhost/archive

On Thu, Nov 11, 2021 at 1:34 PM Kay Kasemir @.***> wrote:

What operating system? Linux? Are you truly running out of memory, or is just the Linux VIRT memory high?

Use VisualVM to monitor memory usage https://visualvm.github.io Slightly better is jProfiler, it can show where the memory is actually allocated, https://www.ej-technologies.com/products/jprofiler/overview.html, but while VisualVM is free, jProfiler needs to be bought.

Try setting MALLOC_ARENA_MAX, see

https://stackoverflow.com/questions/561245/virtual-memory-usage-from-java-under-linux-too-much-memory-used

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ControlSystemStudio/phoebus/issues/2054#issuecomment-966306301, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATMOMUKHCOFGTZUKL6M2NQDULPA5FANCNFSM5H2ENHFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

PS: Screenshot of visualvm is attached:

Screenshot from 2021-11-12 10-17-26 Screenshot from 2021-11-12 10-24-29

kasemir commented 2 years ago

both virtual and physical memory are getting consumed.

In a way that's good because it suggests you see a real memory leak, one that you should be able to track down. Purely virtual memory growth can be observed under linux but it's unclear to me what to do about that and whether it means anything.

I am attaching the screenshot of visualvm for monitoring thread memory

You may have attached the screenshot to the email, but it doesn't show up on github. Go to https://github.com/ControlSystemStudio/phoebus/issues/2054 and attach it directly

It seems that the thread RMI TCP Connection(2)-127.0.0.1 memory consumption is counting up.

That's not a normal thread used by our code. It's a thread used by VisualVM to communicate with the monitored application. It might grow in memory because it's sharing more and more data with VisualVM, but I doubt it's the actual memory leak.

You may have to get jProfiler, which you can use for like a week fo free, and use that to track down which memory is growing, and what code is allocating that memory. https://www.ej-technologies.com/resources/jprofiler/help/doc/heapWalker/memoryLeaks.html

kasemir commented 2 years ago

See also https://www.ej-technologies.com/blog/2009/08/allocation-recording-explained/ https://www.ej-technologies.com/blog/2017/03/finding-a-memory-leak-with-jprofiler/

shroffk commented 2 years ago

This is on starting a trivial panel without any script.

Are you just running a single .bob file, does this file have a lot of images/plots? What all applications are you running.

@kasemir what version of openJDK are you using in production?

kasemir commented 2 years ago

Mostly openJDK 14.0.1 and 15.0.1 Yes, you could simply try an older one to see if that makes a difference.

joshikk commented 2 years ago

Hello, I have attached the screenshots of visualvm in the above message.

@kasemir I am finding that the application without visualvm also counts up on memory consumption. I will try with openjdk 15 or 14, I guess I may have to recompile the code?? I will try and check with the jprofiler and update you.

@shroffk I am running multiple bob. However only one tab has a databrowser displayed. The other bob are having embedded display with set get text controls. I am running archive engine on the IOC machine. However the display system is only running phoebus with bob panels.

I am not using any chart or image controls in my bob. However I am using x-y plots in some panels, but they are not opened. If I am starting a bob with only one combo box, the memory consumption count up is not noticeable/present??

For the memory leak testing I opened only the bob with multiple embedded panels. Here all the embdded panels have set and get controls for text and some action, choice controls and some simple embedded script to change display properties (colour,enable...). No bob with image, databrowser or plot was opened. On starting bob which has multiple panels embedded, the memory consumption grows @0.1GB per minute or so this is without the visualvm. The moment I close this panel the memory consumption stops growing. This is true for individually opening the embedded bob in standalone mode. Another thing, if the IOC is not connected the consumption does not grow.

joshikk commented 2 years ago

I tried with openjdk 14, the problem persisted. It was also present on Windows with openjdk16.

I noticed that in one of the rules I was not using a PV. Image is attached. After I removed the unused PV from the rule, the problem resolved on Windows. Does this (unused PV) cause memory leak?

On Linux, htop is reporting that the percentage consumption of memory is going up. in the jprofiler the heap telemetries is showing constant commited memory. So why is this anamoly? I am yet to get hang of jprofiler tool.

Screenshot from 2021-11-17 16-19-34

Screenshot from 2021-11-17 17-16-55

kasemir commented 2 years ago

..htop is reporting that the percentage consumption of memory is going up. ..in the jprofiler the heap telemetries is showing constant commited memory. So why is this anamoly?

No idea, but that is indeed the question. Your profiler screenshot suggests that there is no memory leak in the Java code. The JVM is simply using and then releasing memory in a sawtooth pattern. That's what Java does, it's typically quite good at it.

I have seen the VIRT memory on Linux grow to unexplainable sizes. It doesn't seem to cause any harm, but looks worrisome in top. Playing with MALLOC_ARENA_MAX can help.

As for a real memory leak, I have seen that happen when running under a virtual graphics system, in our case thinlinc. We connect to some Linux hosts via the thinlinc Remote Desktop tool. It's then running the linux desktop similar to VNC with a virtual frame buffer, and that overall used a lot of memory. The fix there was to run under 'vglrun'. So I wonder if the memory leak that you see is not in the Java code, also not in the basic JVM but purely on the graphics side. Are you using the actual graphics card with a physical monitor, no remove X-via-ssh or VNC or ..?

On either the 'java ...' command line of the script that starts CSS, or via setting JDK_JAVA_OPTIONS, try adding

-Djdk.gtk.verbose=true -Dprism.verbose=true

to see what it's using, then maybe add -Dprism.forceGPU=false -Dprism.order=sw to disable GPU and hardware acceleration in case some fancy graphics card driver is part of the problem.

shroffk commented 2 years ago

I have seen the VIRT memory on Linux grow to unexplainable sizes. It doesn't seem to cause any harm, but looks worrisome in top. Playing with MALLOC_ARENA_MAX can help.

On some of our system, I have seen other applications like chrome, firefox, other UI toolkits all report VIRT memory consumption in the many tens of GBs... often scaring users but not having any real impact.

So I wonder if the memory leak that you see is not in the Java code, also not in the basic JVM but purely on the graphics side.

Based on my experience so far I am leaning towards this explanation.. the same version of CS-Studio running the same opis behave vastly differently on different machines. On some machines there is no memory leak and on other there is. I have also observed that this difference exists even on the same machine if one user is using NX or X forwarding vs someone who is running CS-Studio using the displays physically attached to the machines.

-Dprism.forceGPU=false

I have had to make this the default setting at nsls2, since forcing hardware acceleration on certain machines causes Phoebus to simply crash on startup.

joshikk commented 2 years ago

Both export MALLOC_ARENA_MAX=2 and -Dprism.forceGPU=false are not helping.

I am using the physical machine wo VNC or X forwarding

On Wed, Nov 17, 2021 at 2:29 PM Kunal Shroff @.***> wrote:

I have seen the VIRT memory on Linux grow to unexplainable sizes. It doesn't seem to cause any harm, but looks worrisome in top. Playing with MALLOC_ARENA_MAX can help.

On some of our system, I have seen other applications like chrome, firefox, other UI toolkits all report VIRT memory consumption in the many tens of GBs... often scaring users but not having any real impact.

So I wonder if the memory leak that you see is not in the Java code, also not in the basic JVM but purely on the graphics side.

Based on my experience so far I am leaning towards this explanation.. the same version of CS-Studio running the same opis behave vastly differently on different machines. On some machines there is no memory leak and on other there is. I have also observed that this difference exists even on the same machine if one user is using NX or X forwarding vs someone who is running CS-Studio using the displays physically attached to the machines.

-Dprism.forceGPU=false

I have had to make this the default setting at nsls2, since forcing hardware acceleration on certain machines causes Phoebus to simply crash on startup.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ControlSystemStudio/phoebus/issues/2054#issuecomment-971635360, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATMOMUI63SFQ23EGPI4TZDTUMO36FANCNFSM5H2ENHFA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

kasemir commented 2 years ago

In March 2022, we updated JavaFX from version 16 to 18 (and version 19 just came out with 20 available in early access). Since JavaFX updates always include some GTK-related changes, you might simply try again with the latest.

joshikk commented 2 years ago

Sure, will do it and let you know.

On Tue, Sep 13, 2022 at 7:06 PM Kay Kasemir @.***> wrote:

In March 2022, we updated JavaFX from version 16 to 18 (and version 19 just came out with 20 available in early access). Since JavaFX updates always include some GTK-related changes, you might simply try again with the latest.

— Reply to this email directly, view it on GitHub https://github.com/ControlSystemStudio/phoebus/issues/2054#issuecomment-1245843362, or unsubscribe https://github.com/notifications/unsubscribe-auth/ATMOMUMKCHKW6T5ORPSHIPLV6DGCLANCNFSM5H2ENHFA . You are receiving this because you authored the thread.Message ID: @.***>