fandrei / AppMetrics

Apache License 2.0
8 stars 2 forks source link

Identify location of & fix memory leak #117

Closed mrdavidlaing closed 12 years ago

mrdavidlaing commented 12 years ago

All service data for CiapiLatencyCollector & CiapiLatencyCollector.AllServiceMonitor.Builtin was collected successfully in production from 2012-08-10 - 2012-08-25.

At 2012-08-25 13:57:59 the following error appears in the logs:

2012-08-25 13:57:59.0809434 Exception   System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.\r\n   at System.Threading.Thread.StartInternal(IPrincipal principal, StackCrawlMark& stackMark)\r\n   at System.Threading.Thread.Start(StackCrawlMark& stackMark)\r\n   at System.Threading.Thread.Start()\r\n   at CIAPI.StreamingClient.FaultTolerantLsClientAdapter.Stop()\r\n   at CIAPI.StreamingClient.FaultTolerantLsClientAdapter.Dispose()\r\n   at CIAPI.Streaming.LightstreamerClient.Dispose(Boolean disposing)\r\n   at CIAPI.Streaming.LightstreamerClient.Dispose()\r\n   at LatencyCollectorCore.Monitors.AllServiceMonitor.GetPrice(Client client)\r\n   at LatencyCollectorCore.Monitors.AllServiceMonitor.Execute()

after which only General.DefaultPage measures are collected until the LatencyCollector services were restarted on 2012-09-03

It would appear that there is a memory leak somewhere in the CIAPILatencyCollector service. We need to identify where this is, and remove it.

mrdavidlaing commented 12 years ago

This RStudio script plots memory & CPU usage

It seems pretty clear that we have a memory leak in the CIAPILatencyCollector service

mrdavidlaing commented 12 years ago

@fandrei - I've put the production latency logs onto metrics.labs.cityindex.com (under {ApplicationName}.Production. The metrics.cityindex.com error log didn't contain anything useful.

Is there anything else you need to help debug this issue?

fandrei commented 12 years ago

This info should be enough. Strange that this never surfaced in preprod monitoring.

mrdavidlaing commented 12 years ago

The labs environment machines get rebooted every night when they are backed up. The production environment doesn't (yet). This is probably why we didn't notice it. (and suggests a possible short term solution)

fandrei commented 12 years ago

UPD can I also have binary files from "C:\ProgramData\City Index\CIAPI Latency Collector\" and config from the config server?

mrdavidlaing commented 12 years ago

On metrics.labs.cityindex.com see

D:\issue117\AWS_Ireland_79.125.25.36\C-ProgramData\City Index\CIAPI Latency Collector
D:\issue117\metrics.cityindex.com_79.125.25.30\D-Websites\config.metrics.cityindex.com
fandrei commented 12 years ago

I've found some clues; service process contains a lot of assemblies auto-generated by .NET Framework; investigating what piece of code makes them appear.

fandrei commented 12 years ago

The problem was due to XmlSerializer re-created multiple times (each instantiation of this class generates a unique serialization assembly, that never unloads)

mrdavidlaing commented 12 years ago

Wierd - so you're saying that in this code

void ApplyRemoteSettings(string text)
{
    var serializer = new XmlSerializer(typeof(MonitorSettings));

    using (var rd = new StringReader(text))
    {
    _monitorSettings = (MonitorSettings)serializer.Deserialize(rd);
    }

    _monitorSettings.PollingDisabled = false;
}

serializer (or some resources that it creates) is never collected / cleaned up by the GC?

fandrei commented 12 years ago

Not serializer itself, but helper assembly created by it.

fandrei commented 12 years ago

PS http://blogs.msdn.com/b/tess/archive/2006/02/15/532804.aspx