dotnet / sdk

Core functionality needed to create .NET Core projects, that is shared between Visual Studio and CLI
https://dot.net/core
MIT License
2.66k stars 1.06k forks source link

Potential telemetry cleanup logic error - unsent telemetry events buildup #41796

Open baronfel opened 2 months ago

baronfel commented 2 months ago

Describe the bug

I think there's a scenario where if we can't send telemetry events they will accrue over time until they flip our short-circuiting detections, and because our telemetry-event-storage cleanup function has a bug in it we'll never flip out of that short-circuit mode.

Our telemetry system works on a producer/buffered consumer model. dotnet commands queue events, which are quickly written in an unfinished form to the telemetry storage directory with a random guid filename ($HOME/.dotnet/TelemetryStorageService by default) by a processor, then it is 'Move'd into a final name of the form <timestamp>_<previousguid>.trn. After that buffering, another consumer reads those .trn files into memory in batches and prepares them for final sending, and then a separate consumer pushes those read entries to AppInsights. Once pushed, the individual intermediate files are deleted.

Separately, there's an 'outdated event' cleanup processor that scans the Telemetry Storage Directory for all *.tmp files and deletes them if they are older than 5 minutes.

During this process, no files are written of the form *.tmp, so none are cleaned up by the cleanup processor. Should the processor be changed to clean up .trn files, or all files in the TelemetryStorageService folder?

Forgind commented 2 months ago

During this process, no files are written of the form *.tmp, so none are cleaned up by the cleanup processor. Should the processor be changed to clean up .trn files, or all files in the TelemetryStorageService folder?

Would this get us out of the short-circuit part? If so, this sounds like a very small/easy change, so I think it would make sense to just do it. This sounds like a rare problem, so if that isn't enough, we can probably backlog this.

marcpopMSFT commented 2 months ago

Fix the three letter typo for now.