Open ConnorMcMahon opened 4 years ago
@kashimiz please take a look at the LinuxAppServiceFileLogger here for an example of how to port your existing file logger to use a POSIX-compliant syscall.rename
utility. That should suffice to notify fluent-bit / td-agent about the file roll.
It may be worth pulling this out into a separate library, as there are plenty of projects in the App Service Linux dedicated eco-system that could benefit from a thread-safe file logger that works with our version of td-agent.
I just realized that the ProcessLogQueue
method of the LinuxAppServiceFileLogger
should have a await this.InternalProcessLogQueue();
right after its while loop to make sure the buffer flushes when Stop()
is called.
Otherwise, there's a chance logs will be lost there as well.
So currently we have:
private async Task ProcessLogQueue(object state)
{
while (!this.cancellationTokenSource.IsCancellationRequested)
{
await this.InternalProcessLogQueue();
await Task.Delay(TimeSpan.FromSeconds(this.FlushFrequencySeconds), this.cancellationTokenSource.Token).ContinueWith(task => { });
}
// ReSharper disable once FunctionNeverReturns
}
but we should have
private async Task ProcessLogQueue(object state)
{
while (!this.cancellationTokenSource.IsCancellationRequested)
{
await this.InternalProcessLogQueue();
await Task.Delay(TimeSpan.FromSeconds(this.FlushFrequencySeconds), this.cancellationTokenSource.Token).ContinueWith(task => { });
// if token is cancelled right here, and we have a non-empty buffer, we won't log that data,
}
await this.InternalProcessLogQueue(); // to make sure buffer gets flushed after token is cancelled
// ReSharper disable once FunctionNeverReturns
}
We've also fixed this in our version of the logger
Hi @ConnorMcMahon , Since its an year old issue, can you confirm if this issue needs to be closed or requires a Triage.
This issue requires a triage.
I don't think this issue has been prioritized yet. I'm happy to help with the coding so long as we get the opinion of some Functions Host owners. @kashimiz, could we get your thoughts on this :) ?
Thank you @davidmrdavid , I will be following with the team on this issue
Hi @davidmrdavid , I checked on this issue internally and they suggested you to submit a PR.
As suggested tagging @balag0 / @divyagandhisethi for further inputs
Sounds good. In that case, I'll assign this ticket to myself so I can track it and follow-up with a PR, thank you @v-anvari
Functions logs are written to file by LWAS to guarantee fault tolerance, and the files are periodically rotated so that we do not reach excessive sizes. Our log forwarding service - which is responsible for reading the log files and forwarding the data to our telemetry pipeline - listens for a specific inotify event associated with rotations. When it receives this event, it registers the log file as rotated and searches for the new log file. The File.Move API that is being used to rotate our functions logs is not emitting the correct inotify event, resulting in the log forwarder not being signaled to look for the new logs. By default, our platform initiates periodic restarts of the log forwarding service, which mitigates this issue. However, it can still cause log loss in high volume workloads.
Problem method is LinuxAppServiceFileLogger.RollFiles().
Solution: Use system calls to make sure that the inotify event is emitted.