Open mathewc opened 7 years ago
This is a dangerous change to make for v1 but we should fix it in v2. This would require some updates to the scale controller as I believe it has this logic duplicated.
I've confirmed the scale controller will need an update, but it should be straightforward once we know what the new logic for generating the host ID is.
We had to punt this due to higher priority issues and the need to coordinate the change across multiple components. Revisit again in V3.
As a workaround for customers running into this issue, in Functions v2, you can set an explicit HostID in app settings, using a different ID for each environment. The app setting name to use is AzureFunctionsWebHost:hostId
. For Functions v1 you can specify the ID in host.json via the hostId
property. The host ID value should be unique for all apps/slots you're running. The important thing is that the IDs are under 32 characters. The restrictions for HostIds that the value must satisfy are here. Another way to generate an ID would be to take a GUID, remove the dashes and make it lower case, e.g. 1835D7B5-5C98-4790-815D-072CC94C6F71 => 1835d7b55c984790815d072cc94c6f71
Had a customer issue last week, where this truncation caused collision between multiple function apps sharing the same storage account.
@fabiocav, is this already planned for V3?
Deferring this work as it would have scale controller dependencies and does not align with the timing.
@paulbatum @fabiocav Per #1904 regarding the same detail on architecture center doc, we communicate that the Function App name length to be 1-60
- could you advise what the appropriate length that we should update it to?
@mike-urnun-msft good catch. staying in the range of 1-32 will avoid this bug.
Though there is no specific mention in this issue, I assume that the solution of explicitly setting the host ID via the 'AzureFunctionsWebHost:hostId' App Setting is still valid for v3 Function Apps?
If so, does the host ID have to remain static, or can it change after every deployment? I ask because our CD pipeline updates App Settings through an ARM template, so if the host ID has to remain static we'd first need to query the App Settings to get the current value of 'AzureFunctionsWebHost:hostId' for the slot we are deploying to so that it can be set back with the same value.
Also, does this only apply to a Function App with a slot, or does the value of 'AzureFunctionsWebHost:hostId' have to be unique across Function Apps? For example, if you have these two Function Apps with no slots...
...could both have the same truncated value of 'my-really-really-long-functionap'?
Why is this not mentioned in the docs or warned upon creation? This has been known since 2017, but no obvious mention of it anywhere that I can find. ( I very well could be blind, so if so please correct me. )
+1 I just recently got hit with a case that appears to be, at least partially, caused by this limitation.
@mathewc Is the app setting still a viable approach for v3 function apps? I also was told from an open MS ticket this when running deployment and swap operations, Azure is unable to differentiate between your deployment slots.
We are very interested with this approach because we use a specific naming convention with our resource naming and this limit does not give a lot of wiggle room.
@mpaul31 yes, the app setting is still an option, so setting the host ID using AzureFunctionsWebHost__hostId
(Windows and Linux) or AzureFunctionsWebHost:hostId
(Windows only) is supported.
@mpaul31 yes, the app setting is still an option, so setting the host ID using
AzureFunctionsWebHost__hostId
(Windows and Linux) orAzureFunctionsWebHost:hostId
(Windows only) is supported.
@fabiocav Just to confirm, does the casing of the app setting matter? When looking at the Diagnose and Solve blade for my function app, it mentions this app setting (note it's all lowercase): AzureFunctionsWebHost__hostid
Also, after I added the setting AzureFunctionsWebHost__hostId
Diagnose and Solve still shows a warning about the function name collision. Should I just ignore this?
Assigning to @fabiocav for a proposal for v4
@fabiocav
We added the AzureFunctionsWebHost__hostid
(lowercase) and we still get critical error in the portal:
Also the timer trigger stops working randomly from time to time.
Could you please provide correct unambiguous documentation on how to solve this issue.
Is there also a procedure on how detect if our timer trigger issue comes from the hostId issue? Where do need to check the logs? Because everything is successful.
It just stops firing.
Many thanks.
@cmenzi thanks for highlighting this behavior, we are modifying the logic to check if the HostId is set.
Any updates on this one?
We added code to the host to check for this issue and log a warning in Functions v3. In Functions v4 when detected it's an error and we prevent the host from starting. See https://github.com/Azure/Azure-Functions/issues/2049 for details.
@mathewc: Should the AzureFunctionsWebHost:hostId
setting be slot-sticky ("deployment slot setting")?
Important information AzureFunctionsWebHost:hostId should be lower case. Else in some case it can crash your function host
Important information AzureFunctionsWebHost:hostId should be lower case. Else in some case it can crash your function host
Can we confirm this please. This is very important.
It's only the case when you override and I think it's related to a specific trigger. I had this error when used an uppercased hostid. That is not the case when hostid is not override or when it's lowercased
The specifed resource name contains invalid characters.
RequestId:d154be16-6003-006a-7eff-0d7701000000
Time:2022-01-20T13:14:27.4462948Z
Status: 400 (The specifed resource name contains invalid characters.)
ErrorCode: InvalidResourceName
Content:
<?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidResourceName</Code><Message>The specifed resource name contains invalid characters.
RequestId:d154be16-6003-006a-7eff-0d7701000000
Time:2022-01-20T13:14:27.4462948Z</Message></Error>
Headers:
Server: Windows-Azure-Queue/1.0,Microsoft-HTTPAPI/2.0
x-ms-request-id: d154be16-6003-006a-7eff-0d7701000000
x-ms-version: 2018-11-09
x-ms-error-code: InvalidResourceName
Date: Thu, 20 Jan 2022 13:14:27 GMT
Content-Length: 243
Content-Type: application/xml
@mathewc: Should the
AzureFunctionsWebHost:hostId
setting be slot-sticky ("deployment slot setting")?
we've added this property for each slot (deployment slot setting) and the issues disappear. (windows function)
{
"name": "AzureFunctionsWebHost:hostId",
"value": "my-function-slotname",
"slotSetting": true
}
I think the issue still exist at the v4 SDK https://github.com/Azure/azure-functions-host/wiki/Host-IDs#host-id-collisions
I was under the impression this was actually resolved in V4. Can @fabiocav or @mathewc confirm? It may help to close this thread, to communicate this behavior has been established, if that's the case.
@AlphaWong is correct - the restriction still exists, however in v4 we added detection and prevent the host from starting up in this state as described here.
@mathewc we just ran into this host id truncation issue out of no where, it was working up until last friday July 1, 2022. Our v4 application did start. It also read and processed messages from a incoming queue. However, it would fail writing to an output queue. We have resolved it by changing our site names.
Running a v4 function, we're still facing the Function App Name Collision Found
error when running function configuration diagnostic even though hostid
is setup at slot level (linux consumption plan) using a random lowercase guid without dashes. Should we just ignore this error message? Should I setup FUNCTIONS_HOSTID_CHECK_LEVEL
to Warning
level?
It had also already been asked if we must keep a static hostId
at each deployment or if we can generate a new one but I didn't see any answer to this question.
Thanks in advance for the clarifications.
@AlphaWong is correct - the restriction still exists, however in v4 we added detection and prevent the host from starting up in this state as described here.
This error message appeared for me, even though there was no conflict. It seems this gets triggered if there is any truncation. It does not seem to be preventing my apps from running, though. I think it may be more appropriate for this to appear as a warning when there is truncation, but no conflict detected.
On a related note, why is the limit only 32 characters? That seems rather short when basing things on names assigned by humans. I guess this is really all a result of the fatal flaw in Azure where the name of a resource doubles as the ID of the resource. This decision has created a lot of inconveniences within Azure.
Running a v4 function, we're still facing the
Function App Name Collision Found
error when running function configuration diagnostic even thoughhostid
is setup
Same problem here. Is that expected behavior, should we ignore message or some other action is required?
Should it be hostId
or hostid
also is accepted?
Should it be
hostId
orhostid
also is accepted?
According to the App settings reference for Azure Functions documentation, you can find hostid
written in lower case. I highly recommend to follow that casing to prevent any ambiguity.
I still face the Function App Name Collision Found
error message, but my functions are running well with only the following simple warning at startup: Host id explicitly set in configuration. This is not a recommended configuration and may lead to unexpected behavior.
(Host.Startup
category)
@thibautbrard
This is so confusing. the behavior suggests it should be hostId
to make the warning go away.
Here's what I did:
I repeated it a few times, and it always works this way, AzureFunctionsWebHosthostId is recognized by Diagnostics, but AzureFunctionsWebHosthostid is not.
I'm not sure if it's only Diagnostics that's using hostId with capital I or is it Azure itself? It would be even more confusing if Azure infrastructure is using hostid (as written in the article), but Diagnostics is checking for hostId.
Who could tell what's actually going on and what should be the name of the setting to make it both work properly and also be recognized by Diagnostics?
@progmars that's a really interesting information that you've provided!
From my observation, hostid
(lower-case) is well interpreted as it triggers the warning message at startup but we still get the error on the Diagnose and solve problems
component. We assumed it was only doing a static check of the function name length and nothing else. I wish we could could get rid of this message...
I've just done a test with hostId
and it actually remove the error message on diag as you said. I still get the warning Host id explicitly set in configuration
at function startup, but it does not mean that it works (and unfortunately I can't test it right now)
All of this is really confusing...
@mathewc can you confirm that both hostid
and hostId
are working?
Would it be possible to clarify the casing in the documentation here and here as only hostId
prevent displaying the error message on diagnostics?
Many thanks
Having a conflicting hostId due to the name being truncated also caused me this error: https://github.com/Azure/azure-functions-dotnet-worker/issues/747
2022-09-18T10:24:19.474 [Information] Host initialized (208ms)
2022-09-18T10:24:19.481 [Information] Host started (219ms)
2022-09-18T10:24:19.481 [Information] Job host started
2022-09-18T10:24:19.594 [Information] HttpOptions{"DynamicThrottlesEnabled": false,"EnableChunkedRequestBinding": false,"MaxConcurrentRequests": -1,"MaxOutstandingRequests": -1,"RoutePrefix": "api"}
2022-09-18T10:24:19.810 [Information] Stopping JobHost
2022-09-18T10:24:19.812 [Information] Job host stopped
2022-09-18T10:24:19.844 [Error] Failed to start a new language worker for runtime: dotnet-isolated.System.Threading.Tasks.TaskCanceledException : A task was canceled.
at async Microsoft.Azure.WebJobs.Script.Grpc.GrpcWorkerChannel.StartWorkerProcessAsync(CancellationToken cancellationToken)
at /_/src/WebJobs.Script.Grpc/Channel/GrpcWorkerChannel.cs : 159
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcher.InitializeJobhostLanguageWorkerChannelAsync(??)
at /_/src/WebJobs.Script/Workers/Rpc/FunctionRegistration/RpcFunctionInvocationDispatcher.cs : 154
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcher.InitializeJobhostLanguageWorkerChannelAsync(??)
at /_/src/WebJobs.Script/Workers/Rpc/FunctionRegistration/RpcFunctionInvocationDispatcher.cs : 146
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcher.InitializeJobhostLanguageWorkerChannelAsync(??)
at /_/src/WebJobs.Script/Workers/Rpc/FunctionRegistration/RpcFunctionInvocationDispatcher.cs : 137
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcher.<>c__DisplayClass56_0.<StartWorkerProcesses>b__0(??)
at /_/src/WebJobs.Script/Workers/Rpc/FunctionRegistration/RpcFunctionInvocationDispatcher.cs : 229
Setting a unique (and shorter than 32 characters) hostid allowed the application to start.
This is frustrating because, as noted above, the default app names generated by Visual Studio means that apps can't be successfully deployed to any slot without making this change.
Pretty much the same as above. Seeing the job host started then immediately stopped again, leading to TaskCanceledException
in our functions.
Finding the root cause was an absolute pain, because there is supposed to be an error logged about host id collisions, but there never was anything logged about it. It just shuts down and starts up again immediately in a loop, with short lived functions managing to complete a small amount of work before the host is killed again.
I'll also point out that if the function host is going to be shut down it should be done before functions start to run, instead of cancelling them mid-run
I also need clarification on the case for host ID.
AzureFunctionsWebHost__hostid
or
AzureFunctionsWebHost__hostId
Documentation said all lower case, but then the App services diagnostic settings alert it as a risk, changing it to hostId the alert clears.
We are deploying a Python Application using a custom Docker Container to Azure Functions. I can confirm with absolute certainty, that AzureFunctionsWebHost__hostId (uppercase I) caused our function to fail with logs similar to this comment (except mentioning python).
As soon as we changed to AzureFunctionsWebHost__hostid (lower case i) our function app started working.
With either hostId or hostid, we get the log message "Host id explicitly set in configuration. This is not a recommended configuration and may lead to unexpected behavior." as expected. However, with hostId, we never got a error logged that there was a hostId collision. However, a few log lines later, it says Starting Host (HostId=company1234-trm-prod01-un1-fa-ft (rest of log line ommited)
which is a truncated name.
With hostId we do NOT get the diagnostics error. With hostid we do get the diagnostics error. This leads me to believe that the logic for the diagnostics error is incorrect (case insensitive when it needs to be case sensitive).
Same issue with not being able to find the error. As a customer all I see in my logs is the function host restarting. Can we surface this error into the standard host trace output.
Same here, no error in logs, our readiness probe pinging admin/host/ping just stopped responding after ~10 minutes and that caused AKS to restart the pod. An error, on a critical level, would be nice.
I'll also point out that if the function host is going to be shut down it should be done before functions start to run, instead of cancelling them mid-run
100% agree - I seem to be able to get the function runtime in a real panic because of this coupled with a timer trigger. I understand that fixing the host Id is the right thing to do, but can we make this check happen before functions kick in.
@fabiocav could we please bump this one up the list? @eamonoreilly fyi
spent an entire week trying to track down the root cause for this problem. Very disappointed that nothing pops up in the logs about collisions. This is a known 6 year old issue, I expect more even if it's simply a log of potential reasons for the cause.
This is a PITA and it's been 6 years now since the issue was identified. Any update on the ETA / Roadmap?
We just got hit by this issue. Do you have any updates?
I have started facing this issue suddenly since yesterday. We have V4 function apps deployed on windows elastic plan.
I have function apps having name of length 43 characters & having ambiguity of name in the trail
like abc-defg-abcdefghijklmnopqrstuv-v1-eus & abc-defg-abcdefghijklmnopqrstuv-v2-eus
I was already having below app setting added to Function App configuration: "AzureFunctionsWebHost__hostId" with GUID generated host ID. The same configuration was working since last 6 months & since yesterday it has started failing (function apps stops & starts intermittently).
I have tried adding below app settings:
Host ID collision error is removed from diagnose & solve problem screen but issue isn't resolved. Function apps still stops & starts intermittently).
The only solution that worked for me was to deploy a different function app with truncated name (as we cannot rename the function app) However this is not the solution I was expecting because I have 26 function apps & now I need to remove, rename & redeploy those again, raised a ticket with microsoft support but no response since yesterday.
I have started facing this issue suddenly since yesterday. We have V4 function apps deployed on windows elastic plan.
I have function apps having name of length 43 characters & having ambiguity of name in the trail
like abc-defg-abcdefghijklmnopqrstuv-v1-eus & abc-defg-abcdefghijklmnopqrstuv-v2-eus
I was already having below app setting added to Function App configuration: "AzureFunctionsWebHost__hostId" with GUID generated host ID. The same configuration was working since last 6 months & since yesterday it has started failing (function apps stops & starts intermittently).
I have tried adding below app settings:
- AzureFunctionsWebHosthostId -> AzureFunctionsWebHosthostid
- Removed hyphens (-) from GUID
- Added AzureFunctionsWebHost:hostId (upper I)
- Added AzureFunctionsWebHost:hostid (lower i)
- Added FUNCTIONS_HOSTID_CHECK_LEVEL with value 'Warning'
Host ID collision error is removed from diagnose & solve problem screen but issue isn't resolved. Function apps still stops & starts intermittently).
The only solution that worked for me was to deploy a different function app with truncated name (as we cannot rename the function app) However this is not the solution I was expecting because I have 26 function apps & now I need to remove, rename & redeploy those again, raised a ticket with microsoft support but no response since yesterday.
Any chance it is a scaling/instance issue? I.e. you actually have multiple of the same hostid in use?
@madmahii24 any response from microsoft support? Running into the same issue
Currently when generating a default host ID we use the host name (slot host name) and we truncate to 32 characters max (code here). This ensures that the generated ID conforms to the core SDK length restrictions (code here).
This truncation can of course open the possibility for naming collisions, particularly in the case of slots. In slot scenarios, if the site name is over 32 characters long and a slot is created that starts with the same 32 characters and is only disambiguated in later characters, both the production and the slot site will be using the same host ID. This can lead to issues. For example, TimerTrigger uses the host ID as a component of the blob lease path. In this case the timer function will only be able to run in one of the sites because they're competing for the same lock. Similarly, customers often have apps deployed to different regions using the same long naming path, varying only in later name components (e.g. region/environment), and can run into this.
More information on can be found in Host ID Collisions.