Open baronfel opened 2 months ago
During this process, no files are written of the form *.tmp, so none are cleaned up by the cleanup processor. Should the processor be changed to clean up .trn files, or all files in the TelemetryStorageService folder?
Would this get us out of the short-circuit part? If so, this sounds like a very small/easy change, so I think it would make sense to just do it. This sounds like a rare problem, so if that isn't enough, we can probably backlog this.
Fix the three letter typo for now.
Describe the bug
I think there's a scenario where if we can't send telemetry events they will accrue over time until they flip our short-circuiting detections, and because our telemetry-event-storage cleanup function has a bug in it we'll never flip out of that short-circuit mode.
Our telemetry system works on a producer/buffered consumer model.
dotnet
commands queue events, which are quickly written in an unfinished form to the telemetry storage directory with a random guid filename ($HOME/.dotnet/TelemetryStorageService
by default) by a processor, then it is 'Move'd into a final name of the form<timestamp>_<previousguid>.trn
. After that buffering, another consumer reads those.trn
files into memory in batches and prepares them for final sending, and then a separate consumer pushes those read entries to AppInsights. Once pushed, the individual intermediate files are deleted.Separately, there's an 'outdated event' cleanup processor that scans the Telemetry Storage Directory for all
*.tmp
files and deletes them if they are older than 5 minutes.During this process, no files are written of the form
*.tmp
, so none are cleaned up by the cleanup processor. Should the processor be changed to clean up .trn files, or all files in the TelemetryStorageService folder?