Cimpress-MCP / serilog-sinks-awscloudwatch

A Serilog sink that logs to AWS CloudWatch
Apache License 2.0
67 stars 54 forks source link

ArgumentOutOfRangeException when trying to truncate long message that contains non-ascii chars #100

Closed gagabu closed 3 years ago

gagabu commented 3 years ago

Sometimes I get exception from console output:

Error sending logs. No logs will be sent to AWS CloudWatch. Error was System.ArgumentOutOfRangeException: Index and length must refer to a location within the string. (Parameter 'length')
   at System.String.Substring(Int32 startIndex, Int32 length)
   at Serilog.Sinks.AwsCloudWatch.CloudWatchLogSink.<EmitBatchAsync>b__22_1(LogEvent event)
   at System.Linq.Enumerable.SelectIPartitionIterator`2.MoveNext()
   at System.Collections.Generic.EnumerableHelpers.ToArray[T](IEnumerable`1 source, Int32& length)
   at System.Collections.Generic.Queue`1..ctor(IEnumerable`1 collection)
   at Serilog.Sinks.AwsCloudWatch.CloudWatchLogSink.EmitBatchAsync(IEnumerable`1 events)

That is code that throws this exception:

var messageLength = System.Text.Encoding.UTF8.GetByteCount(message);
if (messageLength > MaxLogEventSize)
{
    // truncate event message
    Debugging.SelfLog.WriteLine("Truncating log event with length of {0}", messageLength);
    message = message.Substring(0, MaxLogEventSize);
}

If message contains non-ascii chars, then Encoding.UTF8.GetByteCount() > message.Length (for example: "паляниця" has 8 chars length, but in UTF8 encoding has 16 bytes in memory). Second parameter of message.Substring means chars not bytes and that is why we get exception. And we are losing whole batch.

Test code to show it:

const int MaxLogEventSize = 15;
var message = "паляниця"; // 8 chars, but 16 bytes 
var messageLength = System.Text.Encoding.UTF8.GetByteCount(message);

if (messageLength > MaxLogEventSize)
{
    // truncate event message
    Debugging.SelfLog.WriteLine("Truncating log event with length of {0}", messageLength);
    message = message.Substring(0, MaxLogEventSize);
}

I've not found cheap and memory tolerant method, that can truncate string using bytes count, not chars. Maybe message.Substring(0, MaxLogEventSize/2); could be possible here or split message into two separate parts if we dont want to lost something important?

thoean commented 3 years ago

That's a good point. We're correctly finding the message length in bytes, but then do character-based substring.

I think we need to change:

message = message.Substring(0, MaxLogEventSize);

to something like this:

let bytes = Encoding.UTF8.GetBytes("Your string with some interesting data").Take(MaxLogEventSize).ToArray();
message = Encoding.UTF8.GetString(bytes, 0, bytes.Length);

Would you mind trying this out, and if it works submitting a pull request?

gagabu commented 3 years ago

I've made pull request https://github.com/Cimpress-MCP/serilog-sinks-awscloudwatch/pull/101