We recently encountered a case where the Azure Functions scale controller was failing to make a scaling decision due to the following error:
System.FormatException: The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.
at System.Convert.FromBase64CharPtr(Char* inputPtr, Int32 inputLength)
at System.Convert.FromBase64String(String s)
at Microsoft.WindowsAzure.Storage.Queue.CloudQueueMessage.get_AsString()
at DurableTask.AzureStorage.Storage.QueueMessage..ctor(CloudQueueMessage cloudQueueMessage) in /_/src/DurableTask.AzureStorage/Storage/QueueMessage.cs:line 26
at DurableTask.AzureStorage.Storage.Queue.PeekMessageAsync() in /_/src/DurableTask.AzureStorage/Storage/Queue.cs:line 193
at DurableTask.AzureStorage.Monitoring.DisconnectedPerformanceMonitor.GetQueueLatencyAsync(Queue queue) in /_/src/DurableTask.AzureStorage/Monitoring/DisconnectedPerformanceMonitor.cs:line 243
As can be seen in the stack trace, this error is thrown when trying to Peek a given queue, which we use to observe the message age amongst other metadata.
The error occurs because, after receiving an azure queue message (of type CloudQueueMessage), we wrap it in our own abstraction called QueueMessage. When constructing this object, we call the AsString method of CloudQueueMessage, which obtains the message contents, and store them in a class property named Message. This method can throw if the message cannot be decoded, yielding the error above.
This is a kind of poison message scenario that can affect the Scale Controller. In the Scale Controller's case, we don't really need to decode the message's contents, as all we need is message and queue metadata such as the average message Age, the number of messages in the queue, and so on.
As a result - this PR moves the call to AsString from the CloudQueueMessage constructor to instead occur in the implementation of it's Message property. This will prevent the ScaleController from error'ing out, as it never needs to access the message's actual contents.
As an aside - I meant to write a test for this, but the CloudQueueMessage type is sealed, so it cannot be mock'ed, and newer versions of the Azure Storage SDK are more robust against undecodable errors. So I decided against it.
We recently encountered a case where the Azure Functions scale controller was failing to make a scaling decision due to the following error:
As can be seen in the stack trace, this error is thrown when trying to Peek a given queue, which we use to observe the message age amongst other metadata.
The error occurs because, after receiving an azure queue message (of type
CloudQueueMessage
), we wrap it in our own abstraction calledQueueMessage
. When constructing this object, we call theAsString
method ofCloudQueueMessage
, which obtains the message contents, and store them in a class property namedMessage
. This method can throw if the message cannot be decoded, yielding the error above.This is a kind of poison message scenario that can affect the Scale Controller. In the Scale Controller's case, we don't really need to decode the message's contents, as all we need is message and queue metadata such as the average message Age, the number of messages in the queue, and so on.
As a result - this PR moves the call to
AsString
from theCloudQueueMessage
constructor to instead occur in the implementation of it'sMessage
property. This will prevent the ScaleController from error'ing out, as it never needs to access the message's actual contents.As an aside - I meant to write a test for this, but the
CloudQueueMessage
type is sealed, so it cannot be mock'ed, and newer versions of the Azure Storage SDK are more robust against undecodable errors. So I decided against it.