corentinaltepe / nlog.loki

NLog target for Loki using an HTTP client
BSD 3-Clause "New" or "Revised" License
22 stars 7 forks source link

"Failed pushing logs to Loki" - InternalLogger does not seem to log the real error into the internal lo #92

Closed Wolfsblvt closed 1 year ago

Wolfsblvt commented 1 year ago

Hey.
We are using NLog.Targets.Loki in the latest version 2.1.0 In a .NET Framework 4.8 environment. The NLog version itself is the latest stable release 5.2.3.

NLog is configured successfully and does log into file and console targets.
We have set up the internal logging of NLog the following way:

<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      autoReload="true"
      throwExceptions="false"
      throwConfigExceptions="true"
      internalLogLevel="Info" internalLogFile="C:\Temp\nlog-internal.log">

The internal logfile exists and NLog writes into it.

Now the problem. Our configured Loki target does not seem to work. One entry in the internal log shows a WARN with Failed pushing logs to Loki..

The full internal log with the relevant section is as follows:

2023-08-24 13:40:25.4433 Info Configuration file change detected! Reloading in 1000ms...
2023-08-24 13:40:25.4512 Info Configuration file change detected! Reloading in 1000ms...
2023-08-24 13:40:26.4578 Info Reloading configuration...
2023-08-24 13:40:26.4578 Info Loading NLog config from XML file: C:\xxxxxxxx\xxxxxx\xxxxx\NLog.config
2023-08-24 13:40:26.4578 Info Registered target NLog.Targets.FileTarget(Name=jsonFile)
2023-08-24 13:40:26.4578 Info Registered target NLog.Loki.LokiTarget(Name=loki)
2023-08-24 13:40:26.4578 Info Registered target NLog.Targets.ConsoleTarget(Name=logconsole)
2023-08-24 13:40:26.4578 Info Closing old configuration.
2023-08-24 13:40:26.4578 Info Validating config: TargetNames=jsonFile, loki, ConfigItems=78, FilePath=C:\xxxxxxxx\xxxxxx\xxxxx\NLog.config
2023-08-24 13:40:26.4578 Warn Unused target detected. Add a rule for this target to the configuration. TargetName: logconsole
2023-08-24 13:40:26.4734 Info Configuration initialized.
2023-08-24 13:40:33.3567 Warn lokiTarget(Name=loki): WriteAsyncTask failed on completion. Sleep 250 ms Exception: System.Net.Http.HttpRequestException: Failed pushing logs to Loki.
   bei NLog.Loki.HttpLokiTransport.<ValidateHttpResponse>d__7.MoveNext()
--- Ende der Stapelüberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde ---
   bei System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   bei System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   bei NLog.Loki.HttpLokiTransport.<WriteLogEventsAsync>d__4.MoveNext()
2023-08-24 13:40:37.0216 Warn lokiTarget(Name=loki): WriteAsyncTask failed on completion. Sleep 250 ms Exception: System.Net.Http.HttpRequestException: Failed pushing logs to Loki.
   bei NLog.Loki.HttpLokiTransport.<ValidateHttpResponse>d__7.MoveNext()
--- Ende der Stapelüberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde ---
   bei System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   bei System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   bei NLog.Loki.HttpLokiTransport.<WriteLogEventsAsync>d__4.MoveNext()
2023-08-24 13:40:41.3041 Warn lokiTarget(Name=loki): WriteAsyncTask failed on completion. Sleep 250 ms Exception: System.Net.Http.HttpRequestException: Failed pushing logs to Loki.
   bei NLog.Loki.HttpLokiTransport.<ValidateHttpResponse>d__7.MoveNext()
--- Ende der Stapelüberwachung vom vorhergehenden Ort, an dem die Ausnahme ausgelöst wurde ---
   bei System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
   bei System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)
   bei NLog.Loki.HttpLokiTransport.<WriteLogEventsAsync>d__4.MoveNext()

I tried different delay and retry settings, but it does not seem to work, same errors. I used Fiddler to try and intercept HTTP traffic, trying to see if the call is being made and runs into some kind of issue, but I don't see any HTTP request being made at all.

I checked manually against the used configuration for loki, and via Postman I can make successful calls against the /loki/api/v1/push endpoint. So it does not seem to be a network issue.

What confuses me the most is that I checked the repository code for this specific error, and there should be some internal logging giving me a better and more detailed error message for the issue from the http client that is used by this here: https://github.com/corentinaltepe/nlog.loki/blob/master/src/NLog.Loki/HttpLokiTransport.cs#L76
But this error line does not appear, even though it's above the only relevant place for the exception message that is being thrown and written into the log.

Any help would be appreciated!

Just so I don't miss anything I will attach the full NLog.config that we have set up

<?xml version="1.0" encoding="utf-8"?>
<nlog xmlns="http://www.nlog-project.org/schemas/NLog.xsd"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      autoReload="true"
      throwExceptions="false"
      throwConfigExceptions="true"
      internalLogLevel="Info" internalLogFile="C:\Temp\nlog-internal.log">

    <!-- *********************************** -->
    <!-- ***     Configuration Area      *** -->
    <!-- *********************************** -->
    <!-- Only change if you now what you are doing -->

    <variable name="environment" value="staging" />
    <variable name="app-name" value="Api" />
    <variable name="sub-app-name" value="Api" />

    <variable name="loki-endpoint" value="http://loki-instance.subdomain.ourdomain.de" />
    <variable name="loki-username" value="loki-xxxxxxxx" />
    <variable name="loki-password" value="xxxxxxxxxxxxx" />

    <!-- *********************************** -->
    <!-- ***  END of Configuration Area  *** -->
    <!-- *********************************** -->

    <extensions>
        <add assembly="NLog.Loki" />
    </extensions>
    <targets>
        <target name="jsonFile" xsi:type="File" fileName="C:\xxxxxxxx\Logs\${sub-app-name}.json"
                archiveFileName="C:\xxxxxxxx\Logs\Archives\${sub-app-name}\{#}.${sub-app-name}.zip"
                archiveNumbering="Date"
                archiveEvery="Day"
                archiveDateFormat="yyyy-MM-dd"
                enableArchiveFileCompression="true">

            <layout xsi:type="JsonLayout" includeEventProperties="true">
                <attribute name="time" layout="${longdate}" />
                <attribute name="level" layout="${level:upperCase=true}" />
                <attribute name="logger" layout="${logger}" />
                <attribute name="message" layout="${message}" />
                <attribute name="exception" layout="${exception:format=@}" encode="false" />
            </layout>
        </target>
        <target name="loki" xsi:type="loki"
                batchSize="200"
                retryCount="0"
                taskDelayMilliseconds="0"
                endpoint="${loki-endpoint}"
                username="${loki-username}"
                password="${loki-password}"
                orderWrites="false"
                compressionLevel="optimal"
                eventPropertiesAsLabels="false"
                sendLastFormatParameter="false">

            <layout xsi:type="JsonLayout" includeEventProperties="true">
                <attribute name="time" layout="${longdate}" />
                <attribute name="level" layout="${level:upperCase=true}" />
                <attribute name="logger" layout="${logger}" />
                <attribute name="message" layout="${message}" />
                <attribute name="exception" layout="${exception:format=@}" encode="false" />
            </layout>

            <label name="environment" layout="${lowercase:${environment}}" />
            <label name="app" layout="${lowercase:${app-name}}" />
            <label name="sub-app" layout="${lowercase:${sub-app-name}}" />
            <label name="server" layout="${hostname:lowercase=true}" />
        </target>
        <target name="logconsole" xsi:type="Console" />
    </targets>

    <rules>
        <logger name="*" minlevel="Debug" writeTo="jsonFile" />
        <logger name="*" minlevel="Debug" writeTo="loki" />
    </rules>
</nlog>
Wolfsblvt commented 1 year ago

Okay. I tracked down the issue.

Grafana does not support hyphens in label names. As specified in their documentation here, only [a-zA-Z_:][a-zA-Z0-9_:]* is valid.

So how did I find out? I cloned this respository and build it myself, adding new logging statements.
Because I was kinda right, while this custom target does work and does get loaded and its code gets executed (which you can't really verify with no success loggings), the specific log line linked above giving me the error does not print. Maybe because of the argument formatting?
I changed it to an inline formatted string, and it does show up correctly in my internal log, which gave me the following error message:

2023-08-24 18:46:25.0820 Error Failed pushing logs to Loki. Code: BadRequest. Reason: Bad Request. Message: 1:57: parse error: unexpected character inside braces: '-'
.

To lock down my issue further, I added another output logging the given events, which showed what my own issue was.

2023-08-24 18:46:25.0820 Error Unhandled error in transport object on writing loki events.
. Events: {"streams":[{"stream":{"environment":"staging","app":"api","sub-app":"api","server":"xxxxxxx"},"values":[["1692895584834647800","{ \u0022time\u0022: \u00222023-08-24 18:46:24.8346\u0022, \u0022level\u0022: \u0022INFO\u0022, \u0022logger\u0022: \u0022Api.Session\u0022, \u0022message\u0022: \u0022No active session. Creating one...\u0022 }"]]}]} Exception: System.Net.Http.HttpRequestException: Failed pushing logs to Loki. Code: BadRequest. Reason: Bad Request. Message: 1:57: parse error: unexpected character inside braces: '-'
.

So, the biggest issue was that the original error that would've helped me tracking down my own fault (wrong configuration for Grafana) was not printed in the internal log, where you'd expect it to be.

Because I've already started working on this, would you accept pull requests adding additional logging that I'd think makes sense in context of this plugin?
Or would you prefer to do that yourself, or just fix that one error logging line?


Edit: Maybe it would also make sense to pre-validate the contents specified in the config. Like cases with wrongly named labels or no labels at all, which are both not allowed for Grafana.

Would make it possible to print more readable error, or even throw it as a configuration issue if throwConfigExceptions="true" is set, as one would expect. Because otherwise you might miss a misconfigured loki target.

snakefoot commented 1 year ago

Would make it possible to print more readable error, or even throw it as a configuration issue if throwConfigExceptions="true" is set, as one would expect. Because otherwise you might miss a misconfigured loki target.

Sounds like a good idea. Pull-request are wellcome to validate the label-names by overriding InitializeTarget().

By default writing InternalLogger.Warn when issue with label-names, but if LogManager.ThrowConfigExceptions = true then also throw NLogConfigurationException

snakefoot commented 1 year ago

Created #93 to fix the InternalLogger-output

Wolfsblvt commented 1 year ago

Thanks for the quick fix!

I'm gonna grab this one for validation of the labels, likely on the weekend.
I think when your PR is through we can mark this ticket itself as resolved, as the real issue I had was that the error was not logged at all, which is solved then.

corentinaltepe commented 1 year ago

Hi, thank you both very much for reporting and fixing the issue. I've merged the PR and published v2.1.1.