Open mathewc opened 3 years ago
In the upgrade docs here it says you should "Use a separate storage account for each function app" and then links to this GitHub Issue. But in the docs here it says, "It's possible for multiple function apps to share the same storage account without any issues". It does go on to say that "... this approach isn't always a good idea in a production environment", but it doesn't say not to do it. Can this be clarified in the docs?
Its okay from the point of functionality as function apps will not step over each other if multiple apps sharing the same storage account in general (except for the issue mentioned above). But it is not recommended due concerns around best practices like availability (single point of failure), security (better isolation/access controls when each app has its own secret)
We ran into problems with this, using Functions v4, dotnet-isolated; however, the only symptom was that the job host constantly restarted, there were no errors nor warnings logged anywhere (while the constant stop-start was logged properly) (this was true for multiple functions, separately). The root cause was indeed the too long name, the issue was actually caused by the function and its staging slot. Actually not both the functions were restarting, only one, sometimes it was the staging slot but sometimes the live one.
The workaround we used is a slot-level config override for AzureFunctionsWebHost__hostId
.
Could you please check how is it possible that there were no errors logged at all?
That's all the logs we see:
2022-04-26T12:30:59.278 [Information] Host lock lease acquired by instance ID 'eb635ef47d9374f47ee1dde305a3c9fd'.
2022-04-26T12:31:21.059 [Information] FUNCTIONS_WORKER_RUNTIME set to dotnet-isolated. Skipping WorkerConfig for language:java
2022-04-26T12:31:21.063 [Information] FUNCTIONS_WORKER_RUNTIME set to dotnet-isolated. Skipping WorkerConfig for language:node
2022-04-26T12:31:21.063 [Information] FUNCTIONS_WORKER_RUNTIME set to dotnet-isolated. Skipping WorkerConfig for language:powershell
2022-04-26T12:31:21.103 [Warning] The Dashboard setting is no longer supported. See https://aka.ms/functions-dashboard for details.
2022-04-26T12:31:21.190 [Information] Initializing Warmup Extension.
2022-04-26T12:31:21.236 [Information] Initializing Host. OperationId: '1c31c0b9-8022-4828-8d8a-404dea961f31'.
2022-04-26T12:31:21.240 [Information] Host initialization: ConsecutiveErrors=0, StartupCount=1, OperationId=1c31c0b9-8022-4828-8d8a-404dea961f31
2022-04-26T12:31:21.279 [Information] ApplicationInsightsLoggerOptions{"SamplingSettings": {"EvaluationInterval": "00:00:15","InitialSamplingPercentage": 100.0,"MaxSamplingPercentage": 100.0,"MaxTelemetryItemsPerSecond": 20.0,"MinSamplingPercentage": 0.1,"MovingAverageRatio": 0.25,"SamplingPercentageDecreaseTimeout": "00:02:00","SamplingPercentageIncreaseTimeout": "00:15:00"},"SamplingExcludedTypes": "Request","SamplingIncludedTypes": null,"SnapshotConfiguration": null,"EnablePerformanceCountersCollection": true,"HttpAutoCollectionOptions": {"EnableHttpTriggerExtendedInfoCollection": true,"EnableW3CDistributedTracing": true,"EnableResponseHeaderInjection": true},"LiveMetricsInitializationDelay": "00:00:15","EnableLiveMetrics": true,"EnableDependencyTracking": true,"DependencyTrackingOptions": null}
2022-04-26T12:31:21.280 [Information] LoggerFilterOptions{"MinLevel": "None","Rules": [{"ProviderName": null,"CategoryName": null,"LogLevel": null,"Filter": "<AddFilter>b__0"},{"ProviderName": "Microsoft.Azure.WebJobs.Script.WebHost.Diagnostics.SystemLoggerProvider","CategoryName": null,"LogLevel": "None","Filter": null},{"ProviderName": "Microsoft.Azure.WebJobs.Script.WebHost.Diagnostics.SystemLoggerProvider","CategoryName": null,"LogLevel": null,"Filter": "<AddFilter>b__0"},{"ProviderName": "Microsoft.Azure.WebJobs.Logging.ApplicationInsights.ApplicationInsightsLoggerProvider","CategoryName": null,"LogLevel": "Trace","Filter": null}]}
2022-04-26T12:31:21.280 [Information] LoggerFilterOptions{"MinLevel": "None","Rules": [{"ProviderName": null,"CategoryName": null,"LogLevel": null,"Filter": "<AddFilter>b__0"},{"ProviderName": "Microsoft.Azure.WebJobs.Script.WebHost.Diagnostics.SystemLoggerProvider","CategoryName": null,"LogLevel": "None","Filter": null},{"ProviderName": "Microsoft.Azure.WebJobs.Script.WebHost.Diagnostics.SystemLoggerProvider","CategoryName": null,"LogLevel": null,"Filter": "<AddFilter>b__0"},{"ProviderName": "Microsoft.Azure.WebJobs.Logging.ApplicationInsights.ApplicationInsightsLoggerProvider","CategoryName": null,"LogLevel": "Trace","Filter": null}]}
2022-04-26T12:31:21.280 [Information] ConcurrencyOptions{"DynamicConcurrencyEnabled": false,"MaximumFunctionConcurrency": 500,"CPUThreshold": 0.8,"SnapshotPersistenceEnabled": true}
2022-04-26T12:31:21.280 [Information] FunctionResultAggregatorOptions{"BatchSize": 1000,"FlushTimeout": "00:00:30","IsEnabled": true}
2022-04-26T12:31:21.280 [Information] SingletonOptions{"LockPeriod": "00:00:15","ListenerLockPeriod": "00:01:00","LockAcquisitionTimeout": "10675199.02:48:05.4775807","LockAcquisitionPollingInterval": "00:00:05","ListenerLockRecoveryPollingInterval": "00:01:00"}
2022-04-26T12:31:21.280 [Information] ServiceBusOptions{"PrefetchCount": 0,"MessageHandlerOptions": {"AutoComplete": true,"MaxAutoRenewDuration": "00:05:00","MaxConcurrentCalls": 32},"SessionHandlerOptions": {"AutoComplete": true,"MaxAutoRenewDuration": "00:05:00","MaxConcurrentSessions": 2000,"MessageWaitTimeout": "00:01:00"},"BatchOptions": {"MaxMessageCount": 1000,"OperationTimeout": "00:01:00","AutoComplete": true}}
2022-04-26T12:31:21.281 [Information] Starting JobHost
2022-04-26T12:31:21.282 [Information] Starting Host (HostId=[too-long-functionname-trimmed-], InstanceId=8dd736e0-24c3-44b8-958c-073ca23ee788, Version=4.3.0.17971, ProcessId=15204, AppDomainId=1, InDebugMode=True, InDiagnosticMode=False, FunctionsExtensionVersion=~4)
2022-04-26T12:31:21.290 [Information] FUNCTIONS_WORKER_RUNTIME set to dotnet-isolated. Skipping WorkerConfig for language:java
2022-04-26T12:31:21.290 [Information] FUNCTIONS_WORKER_RUNTIME set to dotnet-isolated. Skipping WorkerConfig for language:node
2022-04-26T12:31:21.291 [Information] FUNCTIONS_WORKER_RUNTIME set to dotnet-isolated. Skipping WorkerConfig for language:powershell
2022-04-26T12:31:21.295 [Information] Loading functions metadata
2022-04-26T12:31:21.333 [Information] Reading functions metadata
2022-04-26T12:31:21.384 [Information] 1 functions found
2022-04-26T12:31:21.394 [Information] 1 functions loaded
2022-04-26T12:31:21.408 [Information] Generating 1 job function(s)
2022-04-26T12:31:21.444 [Information] Found the following functions:Host.Functions.HubListenerFunction
2022-04-26T12:31:21.475 [Information] Initializing function HTTP routesNo HTTP routes mapped
2022-04-26T12:31:21.479 [Information] Host initialized (192ms)
2022-04-26T12:31:21.486 [Information] HttpOptions{"DynamicThrottlesEnabled": false,"EnableChunkedRequestBinding": false,"MaxConcurrentRequests": -1,"MaxOutstandingRequests": -1,"RoutePrefix": "api"}
2022-04-26T12:31:21.579 [Information] Host started (296ms)
2022-04-26T12:31:21.579 [Information] Job host started
2022-04-26T12:31:22.277 [Information] Host Status: {"id": "[too-long-functionname-trimmed-]","state": "Running","version": "4.3.0.17971","versionDetails": "4.3.0+64614b7392f36e3871862d22d21e72304df901e9","platformVersion": "97.0.10.675","instanceId": "eb635ef47d9374f47ee1dde305a3c9fdb204691970dda38684f67a86ac3e26fb","computerName": "wn0sdwk0007EC","processUptime": 17534,"functionAppContentEditingState": "Unknown"}
2022-04-26T12:31:22.280 [Information] Host Status: {"id": "[too-long-functionname-trimmed-]","state": "Running","version": "4.3.0.17971","versionDetails": "4.3.0+64614b7392f36e3871862d22d21e72304df901e9","platformVersion": "97.0.10.675","instanceId": "eb635ef47d9374f47ee1dde305a3c9fdb204691970dda38684f67a86ac3e26fb","computerName": "wn0sdwk0007EC","processUptime": 17545,"functionAppContentEditingState": "Unknown"}
2022-04-26T12:31:22.353 [Information] Stopping JobHost
2022-04-26T12:31:22.354 [Information] Stopping the listener 'Microsoft.Azure.WebJobs.ServiceBus.Listeners.ServiceBusListener' for function 'HubListenerFunction'
2022-04-26T12:31:22.438 [Information] Worker process started and initialized.
2022-04-26T12:31:26.317 [Information] Host lock lease acquired by instance ID 'eb635ef47d9374f47ee1dde305a3c9fd'.
Our v3 Functions instances are configured to use unique hubName
s but share a single storageProvider.trackingStoreNamePrefix
in a single storage account. This setup allows each Functions instance to fetch the orchestration statuses of all the other running instances from a single source of truth. Using this technique, it is possible to apply a side-by-side deployment scenario of our Durable Functions based asynchronous API wherein the superseded "legacy" function backends of our API gateway controlled service can complete their running orchestrations while the "current" functions backend version can execute own orchestrations while still being able to retrieve the processing status of the legacy backend.
I understand that this setup would be impossible when using separate storage accounts, correct?
We are still facing the same challenge... Any ideas regarding my context?
@mathewc
I do not fully understand why we are geting this error from all of a sudden.
we have several functionapps which share one storage account and some of the functionapp names are longer than 32 characters.
But it worked for several years.
Furthermore I do not understand where to find and check the host ids. IN the storage account under azure-webjobs-hosts/ids/usage
there are several entries but only one contains the name of one of our functions apps.
The others are like this:
functionsv3x64placeholdertemplat functionsv4x64dotnetisolated6pla functionsv4x64dotnetisolated7pla functionsv4x64dotnetisolated8pla functionsv4x64node14placeholdert
etc. etc.
So for me it looks like you do not use the functionapp name to generate a host id but some other strange and weird method which really could not work ?
AND why is this suddenly changed?
Motivation
This change is meant to address longstanding issue https://github.com/Azure/azure-functions-host/issues/2015. The underlying issue is caused by the following things being true:
If these are all true, then there is a collision and you can end up with unexpected outcomes as the two apps share state.
When this issue is detected for an app a log will be written. In v4 this will be an Error. In v3 this will be a Warning:
"A collision for Host ID '{hostid}' was detected in the configured storage account. For more information, see https://aka.ms/functions-hostid-collision."
Impact
At the point in time when this issue was create, about 28% of v3 apps have a name subject to truncation. Of those, only apps that are sharing storage accounts would be affected, which is likely a much lower percentage.
Compat-mode support
There will be an app setting FUNCTIONS_HOSTID_CHECK_LEVEL that supports values Warning/Error. In v3 the default will be Warning, and in v4 it will be Error. By default when this issue is detected in v4 it will be a hard failure. As an escape hatch this app setting can be used to change back to warning if needed.
Alternatives
Yes we discussed alternatives, but none of those were acceptable.
Detection
The v3 warning can be used to detect this.
Support
The error will have an aka.ms link with documentation on the issue and the recommended course of action for affected customers to take.
Documentation
Documentation impact
Components impacted
Functions Host
Performance
No.