Closed mrdavidlaing closed 12 years ago
This RStudio script plots memory & CPU usage
It seems pretty clear that we have a memory leak in the CIAPILatencyCollector service
@fandrei - I've put the production latency logs onto metrics.labs.cityindex.com (under {ApplicationName}.Production. The metrics.cityindex.com error log didn't contain anything useful.
Is there anything else you need to help debug this issue?
This info should be enough. Strange that this never surfaced in preprod monitoring.
The labs environment machines get rebooted every night when they are backed up. The production environment doesn't (yet). This is probably why we didn't notice it. (and suggests a possible short term solution)
UPD can I also have binary files from "C:\ProgramData\City Index\CIAPI Latency Collector\" and config from the config server?
On metrics.labs.cityindex.com
see
D:\issue117\AWS_Ireland_79.125.25.36\C-ProgramData\City Index\CIAPI Latency Collector
D:\issue117\metrics.cityindex.com_79.125.25.30\D-Websites\config.metrics.cityindex.com
I've found some clues; service process contains a lot of assemblies auto-generated by .NET Framework; investigating what piece of code makes them appear.
The problem was due to XmlSerializer re-created multiple times (each instantiation of this class generates a unique serialization assembly, that never unloads)
Wierd - so you're saying that in this code
void ApplyRemoteSettings(string text)
{
var serializer = new XmlSerializer(typeof(MonitorSettings));
using (var rd = new StringReader(text))
{
_monitorSettings = (MonitorSettings)serializer.Deserialize(rd);
}
_monitorSettings.PollingDisabled = false;
}
serializer (or some resources that it creates) is never collected / cleaned up by the GC?
Not serializer itself, but helper assembly created by it.
All service data for CiapiLatencyCollector & CiapiLatencyCollector.AllServiceMonitor.Builtin was collected successfully in production from 2012-08-10 - 2012-08-25.
At 2012-08-25 13:57:59 the following error appears in the logs:
after which only
General.DefaultPage
measures are collected until the LatencyCollector services were restarted on 2012-09-03It would appear that there is a memory leak somewhere in the CIAPILatencyCollector service. We need to identify where this is, and remove it.