Alachisoft / NCache

NCache: Highly Scalable Distributed Cache for .NET
http://www.alachisoft.com
Apache License 2.0
647 stars 123 forks source link

NCache Open Source is consuming a tremendous amount of memory; way more than what has been set in "cache-size" #45

Open DamienLaw opened 5 years ago

DamienLaw commented 5 years ago

I have 4 caches, each with cache_size of 32mb and cluster size of 2. Therefore the total capacity is at 64mb for each cache. Please refer to the screenshot below.

image

However, after a certain period of time (even at idle - no activities (insert into cache) have been taken place since the caches were started), the memory consumption of the caches will climb to a point exceeding way much more than what I have capped. Averaging to 1gb per cache. Please refer to the screenshot below.

image

My server constantly reports "Insufficient Memory" for my other web server activities. Why is NCache taking up so much memory? The caches are on a staging server, therefore there aren't any activities taking place during the weekend and yet the consumption climbs to an alarming and worrying degree.

Kal-Alachisoft commented 5 years ago

Hi @DamienLaw , I am going to try to help you with this.

Can you please confirm what is the NCache Version running in your environment? You can share a screenshot of the 'Get-NCacheVersion' cmdlet from Power shell or 'C:\Program Files\NCache\bin\tools\verifylicense.exe' with the 'verifylicense.exe' command.

DamienLaw commented 5 years ago

Hi @DamienLaw , I am going to try to help you with this.

Can you please confirm what is the NCache Version running in your environment? You can share a screenshot of the 'Get-NCacheVersion' cmdlet from Power shell or 'C:\Program Files\NCache\bin\tools\verifylicense.exe' with the 'verifylicense.exe' command.

Hi there, appreciate your help. You can actually discern the version of NCache from the first screenshot itself. It's NCache Open Source 5.0

Thank you.

Kal-Alachisoft commented 5 years ago

Hi @DamienLaw , I am going to try to run some tests on my end to reproduce this. I will keep you update with the progress.

DamienLaw commented 5 years ago

@Kal-Alachisoft thanks for your attention in this matter. Do you have any success so far?

Kal-Alachisoft commented 4 years ago

Hi @DamienLaw I was unable to reproduce this on my end after my efforts. Are you still having this issue?

DamienLaw commented 4 years ago

@Kal-Alachisoft Yes, I'm still having this problem. I set the cache size to 32mb but currently the caches are running at around 200mb. Intermittently on both replicated and partitioned topologies some of the caches will spike to around 900mb. I have to constantly monitor the memory consumption of caches and restart them if necessary. I have even updated to the latest NCache Open Source 5.0 SP1.

You can find the memory dump file below. It's running at 200mb although with a cache-size setting of 32mb. https://mega.nz/#!QARwiCRb!Ic3zRGmGPa7wgDricWfY4T5MBtGZKYvmmBK6GH04-Es

Kal-Alachisoft commented 4 years ago

Hi @DamienLaw , please email us at support@alachisoft.com

I am going to help you regarding this there. The link you've posted is unreachable for me. I will share instructions on where to upload the memory dumps.

DamienLaw commented 4 years ago

@Kal-Alachisoft I must have not linked the hyperlink correctly. You can actually highlight the link and copy it instead of clicking right on the link itself. Anyway, I've fixed the hyperlink and it's working correctly now. Thank you for helping out.

DamienLaw commented 4 years ago

The following are the memory dumps of two of my caches whose memory consumption spiked to 900mb despite having a cache size of 32mb. These caches are not on production therefore not actively being used.

Hopefully these may provide you with some insights into the problem.

DamienLaw commented 4 years ago

image I loaded one of the memory dumps into Jetbrains' dot Memory. Only 6.03MB were being used while 772.16MB might be caused caused by heap fragmentation.

https://dotnettools-support.jetbrains.com/hc/en-us/community/posts/360000524724-What-is-retaining-NET-total-memory

Kal-Alachisoft commented 4 years ago

Hi @DamienLaw thanks for sharing this. I was able to download them and will pass this on to engineering for their analysis. In the meantime, can you please also share the following for me from your environment.

DamienLaw commented 4 years ago

How's the progress of this? Were you able to reproduce this issue?

Ross-Alachisoft commented 4 years ago

Hi Damien,

Kal had requested some dlls from your environment along with the config files. We are still waiting on those. Please share the below requested data with me so I can forward them to the engineering team and get a conclusion on this issue.

C:\Windows\Microsoft.NET\Framework64<v4.0 framework version>\sos.dll

C:\Windows\Microsoft.NET\Framework64<v4.0 framework version\mscordacwks.dll

C:\Windows\Microsoft.NET\Framework64<v4.0 framework version \clr.dll

C:\Windows\Microsoft.NET\Framework<v4.0 framework version>\sos.dll

C:\Windows\Microsoft.NET\Framework<v4.0 framework version\mscordacwks.dll

C:\Windows\Microsoft.NET\Framework<v4.0 framework version \clr.dll

Config files from the Cache Servers hosting the clustered cache located at “C:\Program Files\NCache\config\config.ncconf”

DamienLaw commented 4 years ago

Here you go. New folder.zip

Mark-NCache commented 4 years ago

Thank you for sharing with the memory dump files. Our engineering team has analyzed the memory dump and have come back with below findings

  1. It seems that you are inserting a very large size object into the cache. Serialized data size of this object is around 721420315 bytes (72Mbs).
  2. This object is created on Large object heap but has no references and is collectable through GC as seen in the memory dump.
  3. Whenever the .NET garbage collector is activated, it traces through live objects in the large object heap (LOH) and collection is performed when system needs memory. However, in this case, this object is created on large object heap but it's not being collected and as a result memory is still occupied.
  4. Both dumps are identical.

LOH address space: The version of SOS does not match the version of CLR you are debugging. Please load the matching version of SOS for the version of CLR you are debugging. CLR Version: 4.7.3468.0 SOS Version: 4.8.3815.0 Address MT Size xxxxxxxxxxxxxxxx 000002554695bfc0 30 Free xxxxxxxxxxxxxxxx 00007ffb1ebdeab0 1048600
xxxxxxxxxxxxxxxx 000002554695bfc0 30 Free xxxxxxxxxxxxxxxx 00007ffb1ebdeab0 1048600
xxxxxxxxxxxxxxxx 000002554695bfc0 30 Free xxxxxxxxxxxxxxxx 00007ffb1ebdeab0 721420315

Statistics: MT Count TotalSize Class Name 000002554695bfc0 4 114 Free 00007ffb1ebdeab0 3 723517515 System.Byte[] Total 7 objects

In order to further diagnose on why memory is not picked up for this object, please share below details.

  1. We are seeing 721MB size for a single object, please confirm if you are indeed adding an item in the cache which is this large. Also share details on the use for this object to be cached?
  2. Engineering recommended to log “.NET CLR Memory” for your cache process while issue is still being reproduced. This will give us some insights into memory allocation for cache host process as well as invocation of GC. Please log all counters for this category with sample interval of 1 second.
DamienLaw commented 4 years ago

We are seeing 721MB size for a single object, please confirm if you are indeed adding an item in the cache which is this large. Also share details on the use for this object to be cached?

The cache is not being used in any way. We just spinned up several caches and let them sit on the server for several weeks. Nothing is being added into the caches. Over time, the caches would inflate.

Engineering recommended to log “.NET CLR Memory” for your cache process while issue is still being reproduced. This will give us some insights into memory allocation for cache host process as well as invocation of GC. Please log all counters for this category with sample interval of 1 second.

Any guides on how to do this?

Mark-NCache commented 4 years ago

Hi Damien,

Thanks for your feedback and I have discussed this again with our engineering and here are their comments on this.

The cache is not being used in any way. We just spinned up several caches and let them sit on the server for several weeks. Nothing is being added into the caches. Over time, the caches would inflate.

We have seen Cache Hits and Miss stats in the memory dumps which indicate that cache is being used while this memory dump was collected. Here is a snippet which confirms it.

          MT    Field   Offset                 Type VT     Attr            Value Name

xxxxxxxxxxxxxxxxxxx 4000529 8 System.String 0 instance xxxxxxxxxxxxxxxxxxx _className xxxxxxxxxxxxxxxxxxx 400052a 78 System.DateTime 1 instance xxxxxxxxxxxxxxxxxxx _upTime xxxxxxxxxxxxxxxxxxx 400052b 30 System.Int64 1 instance 0 _count xxxxxxxxxxxxxxxxxxx 400052c 38 System.Int64 1 instance 0 _sessionCount xxxxxxxxxxxxxxxxxxx 400052d 40 System.Int64 1 instance 15 _hiCount xxxxxxxxxxxxxxxxxxx 400052e 48 System.Int64 1 instance 0 _maxCount xxxxxxxxxxxxxxxxxxx 400052f 50 System.Int64 1 instance 0 _maxSize xxxxxxxxxxxxxxxxxxx 4000530 58 System.Int64 1 instance 6775 _hitCount xxxxxxxxxxxxxxxxxxx 4000531 60 System.Int64 1 instance 388 _missCount xxxxxxxxxxxxxxxxxxx 4000532 68 System.Int64 1 instance 0 _dataSize

I have some more information about the object which is being cached and would like to know if I should share that info here (public forum) or send it to your privately? Memory dump analysis confirms that the 721 MB object is not one of our internal NCache data structures so needs verification on where this data is coming from. Please comment and I will share object details accordingly.

andrevoltolini commented 4 years ago

Hello, I'm having the same problem but in my case using Events & Pub / Sub Messaging. As time goes by, the cache grows in memory consumption but it is not used, making it look like a memory leak.

Version: 5.0 SP2

Thanks for any help.

DamienLaw commented 4 years ago

Hi Damien,

Thanks for your feedback and I have discussed this again with our engineering and here are their comments on this.

The cache is not being used in any way. We just spinned up several caches and let them sit on the server for several weeks. Nothing is being added into the caches. Over time, the caches would inflate.

We have seen Cache Hits and Miss stats in the memory dumps which indicate that cache is being used while this memory dump was collected. Here is a snippet which confirms it.

          MT    Field   Offset                 Type VT     Attr            Value Name

xxxxxxxxxxxxxxxxxxx 4000529 8 System.String 0 instance xxxxxxxxxxxxxxxxxxx _className xxxxxxxxxxxxxxxxxxx 400052a 78 System.DateTime 1 instance xxxxxxxxxxxxxxxxxxx _upTime xxxxxxxxxxxxxxxxxxx 400052b 30 System.Int64 1 instance 0 _count xxxxxxxxxxxxxxxxxxx 400052c 38 System.Int64 1 instance 0 _sessionCount xxxxxxxxxxxxxxxxxxx 400052d 40 System.Int64 1 instance 15 _hiCount xxxxxxxxxxxxxxxxxxx 400052e 48 System.Int64 1 instance 0 _maxCount xxxxxxxxxxxxxxxxxxx 400052f 50 System.Int64 1 instance 0 _maxSize xxxxxxxxxxxxxxxxxxx 4000530 58 System.Int64 1 instance 6775 _hitCount xxxxxxxxxxxxxxxxxxx 4000531 60 System.Int64 1 instance 388 _missCount xxxxxxxxxxxxxxxxxxx 4000532 68 System.Int64 1 instance 0 _dataSize

I have some more information about the object which is being cached and would like to know if I should share that info here (public forum) or send it to your privately? Memory dump analysis confirms that the 721 MB object is not one of our internal NCache data structures so needs verification on where this data is coming from. Please comment and I will share object details accordingly.

You may email to me at damienlaw@live.com Thanks a lot.