Azure / azure-functions-host

The host/runtime that powers Azure Functions
https://functions.azure.com
MIT License
1.94k stars 441 forks source link

HostID Truncation can cause collisions #2015

Open mathewc opened 7 years ago

mathewc commented 7 years ago

Currently when generating a default host ID we use the host name (slot host name) and we truncate to 32 characters max (code here). This ensures that the generated ID conforms to the core SDK length restrictions (code here).

This truncation can of course open the possibility for naming collisions, particularly in the case of slots. In slot scenarios, if the site name is over 32 characters long and a slot is created that starts with the same 32 characters and is only disambiguated in later characters, both the production and the slot site will be using the same host ID. This can lead to issues. For example, TimerTrigger uses the host ID as a component of the blob lease path. In this case the timer function will only be able to run in one of the sites because they're competing for the same lock. Similarly, customers often have apps deployed to different regions using the same long naming path, varying only in later name components (e.g. region/environment), and can run into this.

More information on can be found in Host ID Collisions.

paulbatum commented 7 years ago

This is a dangerous change to make for v1 but we should fix it in v2. This would require some updates to the scale controller as I believe it has this logic duplicated.

paulbatum commented 6 years ago

I've confirmed the scale controller will need an update, but it should be straightforward once we know what the new logic for generating the host ID is.

paulbatum commented 6 years ago

We had to punt this due to higher priority issues and the need to coordinate the change across multiple components. Revisit again in V3.

mathewc commented 5 years ago

As a workaround for customers running into this issue, in Functions v2, you can set an explicit HostID in app settings, using a different ID for each environment. The app setting name to use is AzureFunctionsWebHost:hostId. For Functions v1 you can specify the ID in host.json via the hostId property. The host ID value should be unique for all apps/slots you're running. The important thing is that the IDs are under 32 characters. The restrictions for HostIds that the value must satisfy are here. Another way to generate an ID would be to take a GUID, remove the dashes and make it lower case, e.g. 1835D7B5-5C98-4790-815D-072CC94C6F71 => 1835d7b55c984790815d072cc94c6f71

ankitkumarr commented 5 years ago

Had a customer issue last week, where this truncation caused collision between multiple function apps sharing the same storage account.

@fabiocav, is this already planned for V3?

fabiocav commented 4 years ago

Deferring this work as it would have scale controller dependencies and does not align with the timing.

mikeurnun commented 4 years ago

@paulbatum @fabiocav Per #1904 regarding the same detail on architecture center doc, we communicate that the Function App name length to be 1-60 - could you advise what the appropriate length that we should update it to?

paulbatum commented 4 years ago

@mike-urnun-msft good catch. staying in the range of 1-32 will avoid this bug.

dgard1981 commented 4 years ago

Though there is no specific mention in this issue, I assume that the solution of explicitly setting the host ID via the 'AzureFunctionsWebHost:hostId' App Setting is still valid for v3 Function Apps?

If so, does the host ID have to remain static, or can it change after every deployment? I ask because our CD pipeline updates App Settings through an ARM template, so if the host ID has to remain static we'd first need to query the App Settings to get the current value of 'AzureFunctionsWebHost:hostId' for the slot we are deploying to so that it can be set back with the same value.

Also, does this only apply to a Function App with a slot, or does the value of 'AzureFunctionsWebHost:hostId' have to be unique across Function Apps? For example, if you have these two Function Apps with no slots...

...could both have the same truncated value of 'my-really-really-long-functionap'?

tyler555g commented 4 years ago

Why is this not mentioned in the docs or warned upon creation? This has been known since 2017, but no obvious mention of it anywhere that I can find. ( I very well could be blind, so if so please correct me. )

davidmrdavid commented 4 years ago

+1 I just recently got hit with a case that appears to be, at least partially, caused by this limitation.

mpaul31 commented 3 years ago

@mathewc Is the app setting still a viable approach for v3 function apps? I also was told from an open MS ticket this when running deployment and swap operations, Azure is unable to differentiate between your deployment slots.

We are very interested with this approach because we use a specific naming convention with our resource naming and this limit does not give a lot of wiggle room.

fabiocav commented 3 years ago

@mpaul31 yes, the app setting is still an option, so setting the host ID using AzureFunctionsWebHost__hostId(Windows and Linux) or AzureFunctionsWebHost:hostId (Windows only) is supported.

mpaul31 commented 3 years ago

@mpaul31 yes, the app setting is still an option, so setting the host ID using AzureFunctionsWebHost__hostId(Windows and Linux) or AzureFunctionsWebHost:hostId (Windows only) is supported.

@fabiocav Just to confirm, does the casing of the app setting matter? When looking at the Diagnose and Solve blade for my function app, it mentions this app setting (note it's all lowercase): AzureFunctionsWebHost__hostid

Also, after I added the setting AzureFunctionsWebHost__hostId Diagnose and Solve still shows a warning about the function name collision. Should I just ignore this?

brettsam commented 3 years ago

Assigning to @fabiocav for a proposal for v4

cmenzi commented 3 years ago

@fabiocav We added the AzureFunctionsWebHost__hostid (lowercase) and we still get critical error in the portal:

image

Also the timer trigger stops working randomly from time to time.

Could you please provide correct unambiguous documentation on how to solve this issue.

Is there also a procedure on how detect if our timer trigger issue comes from the hostId issue? Where do need to check the logs? Because everything is successful.

It just stops firing.

Many thanks.

sidkri commented 3 years ago

@cmenzi thanks for highlighting this behavior, we are modifying the logic to check if the HostId is set.

nilshjalmarson commented 2 years ago

Any updates on this one?

mathewc commented 2 years ago

We added code to the host to check for this issue and log a warning in Functions v3. In Functions v4 when detected it's an error and we prevent the host from starting. See https://github.com/Azure/Azure-Functions/issues/2049 for details.

ghost commented 2 years ago

@mathewc: Should the AzureFunctionsWebHost:hostId setting be slot-sticky ("deployment slot setting")?

bhugot commented 2 years ago

Important information AzureFunctionsWebHost:hostId should be lower case. Else in some case it can crash your function host

tyler555g commented 2 years ago

Important information AzureFunctionsWebHost:hostId should be lower case. Else in some case it can crash your function host

Can we confirm this please. This is very important.

bhugot commented 2 years ago

It's only the case when you override and I think it's related to a specific trigger. I had this error when used an uppercased hostid. That is not the case when hostid is not override or when it's lowercased

The specifed resource name contains invalid characters.
RequestId:d154be16-6003-006a-7eff-0d7701000000
Time:2022-01-20T13:14:27.4462948Z
Status: 400 (The specifed resource name contains invalid characters.)
ErrorCode: InvalidResourceName

Content:
<?xml version="1.0" encoding="utf-8"?><Error><Code>InvalidResourceName</Code><Message>The specifed resource name contains invalid characters.
RequestId:d154be16-6003-006a-7eff-0d7701000000
Time:2022-01-20T13:14:27.4462948Z</Message></Error>

Headers:
Server: Windows-Azure-Queue/1.0,Microsoft-HTTPAPI/2.0
x-ms-request-id: d154be16-6003-006a-7eff-0d7701000000
x-ms-version: 2018-11-09
x-ms-error-code: InvalidResourceName
Date: Thu, 20 Jan 2022 13:14:27 GMT
Content-Length: 243
Content-Type: application/xml
TraderMoe commented 2 years ago

@mathewc: Should the AzureFunctionsWebHost:hostId setting be slot-sticky ("deployment slot setting")?

we've added this property for each slot (deployment slot setting) and the issues disappear. (windows function)

{
    "name": "AzureFunctionsWebHost:hostId",
    "value": "my-function-slotname",
    "slotSetting": true
}
AlphaWong commented 2 years ago

I think the issue still exist at the v4 SDK https://github.com/Azure/azure-functions-host/wiki/Host-IDs#host-id-collisions

davidmrdavid commented 2 years ago

I was under the impression this was actually resolved in V4. Can @fabiocav or @mathewc confirm? It may help to close this thread, to communicate this behavior has been established, if that's the case.

mathewc commented 2 years ago

@AlphaWong is correct - the restriction still exists, however in v4 we added detection and prevent the host from starting up in this state as described here.

walterstypula commented 2 years ago

@mathewc we just ran into this host id truncation issue out of no where, it was working up until last friday July 1, 2022. Our v4 application did start. It also read and processed messages from a incoming queue. However, it would fail writing to an output queue. We have resolved it by changing our site names.

thibautbrard commented 2 years ago

Running a v4 function, we're still facing the Function App Name Collision Found error when running function configuration diagnostic even though hostid is setup at slot level (linux consumption plan) using a random lowercase guid without dashes. Should we just ignore this error message? Should I setup FUNCTIONS_HOSTID_CHECK_LEVEL to Warning level? It had also already been asked if we must keep a static hostId at each deployment or if we can generate a new one but I didn't see any answer to this question. Thanks in advance for the clarifications.

evandcombs commented 2 years ago

@AlphaWong is correct - the restriction still exists, however in v4 we added detection and prevent the host from starting up in this state as described here.

This error message appeared for me, even though there was no conflict. It seems this gets triggered if there is any truncation. It does not seem to be preventing my apps from running, though. I think it may be more appropriate for this to appear as a warning when there is truncation, but no conflict detected.

On a related note, why is the limit only 32 characters? That seems rather short when basing things on names assigned by humans. I guess this is really all a result of the fatal flaw in Azure where the name of a resource doubles as the ID of the resource. This decision has created a lot of inconveniences within Azure.

SerlokPK commented 2 years ago

Running a v4 function, we're still facing the Function App Name Collision Found error when running function configuration diagnostic even though hostid is setup

Same problem here. Is that expected behavior, should we ignore message or some other action is required?

progmars commented 2 years ago

Should it be hostId or hostid also is accepted?

thibautbrard commented 2 years ago

Should it be hostId or hostid also is accepted?

According to the App settings reference for Azure Functions documentation, you can find hostid written in lower case. I highly recommend to follow that casing to prevent any ambiguity.

I still face the Function App Name Collision Found error message, but my functions are running well with only the following simple warning at startup: Host id explicitly set in configuration. This is not a recommended configuration and may lead to unexpected behavior. (Host.Startup category)

progmars commented 2 years ago

@thibautbrard

This is so confusing. the behavior suggests it should be hostId to make the warning go away.

Here's what I did:

I repeated it a few times, and it always works this way, AzureFunctionsWebHosthostId is recognized by Diagnostics, but AzureFunctionsWebHosthostid is not.

I'm not sure if it's only Diagnostics that's using hostId with capital I or is it Azure itself? It would be even more confusing if Azure infrastructure is using hostid (as written in the article), but Diagnostics is checking for hostId.

Who could tell what's actually going on and what should be the name of the setting to make it both work properly and also be recognized by Diagnostics?

thibautbrard commented 2 years ago

@progmars that's a really interesting information that you've provided! From my observation, hostid (lower-case) is well interpreted as it triggers the warning message at startup but we still get the error on the Diagnose and solve problems component. We assumed it was only doing a static check of the function name length and nothing else. I wish we could could get rid of this message...

I've just done a test with hostId and it actually remove the error message on diag as you said. I still get the warning Host id explicitly set in configuration at function startup, but it does not mean that it works (and unfortunately I can't test it right now) All of this is really confusing...

@mathewc can you confirm that both hostid and hostId are working? Would it be possible to clarify the casing in the documentation here and here as only hostId prevent displaying the error message on diagnostics?

Many thanks

jassent commented 2 years ago

Having a conflicting hostId due to the name being truncated also caused me this error: https://github.com/Azure/azure-functions-dotnet-worker/issues/747

2022-09-18T10:24:19.474 [Information] Host initialized (208ms)
2022-09-18T10:24:19.481 [Information] Host started (219ms)
2022-09-18T10:24:19.481 [Information] Job host started
2022-09-18T10:24:19.594 [Information] HttpOptions{"DynamicThrottlesEnabled": false,"EnableChunkedRequestBinding": false,"MaxConcurrentRequests": -1,"MaxOutstandingRequests": -1,"RoutePrefix": "api"}
2022-09-18T10:24:19.810 [Information] Stopping JobHost
2022-09-18T10:24:19.812 [Information] Job host stopped
2022-09-18T10:24:19.844 [Error] Failed to start a new language worker for runtime: dotnet-isolated.System.Threading.Tasks.TaskCanceledException : A task was canceled.
at async Microsoft.Azure.WebJobs.Script.Grpc.GrpcWorkerChannel.StartWorkerProcessAsync(CancellationToken cancellationToken)
at /_/src/WebJobs.Script.Grpc/Channel/GrpcWorkerChannel.cs : 159
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcher.InitializeJobhostLanguageWorkerChannelAsync(??) 
at /_/src/WebJobs.Script/Workers/Rpc/FunctionRegistration/RpcFunctionInvocationDispatcher.cs : 154
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcher.InitializeJobhostLanguageWorkerChannelAsync(??) 
at /_/src/WebJobs.Script/Workers/Rpc/FunctionRegistration/RpcFunctionInvocationDispatcher.cs : 146
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcher.InitializeJobhostLanguageWorkerChannelAsync(??) 
at /_/src/WebJobs.Script/Workers/Rpc/FunctionRegistration/RpcFunctionInvocationDispatcher.cs : 137
at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()
at async Microsoft.Azure.WebJobs.Script.Workers.Rpc.RpcFunctionInvocationDispatcher.<>c__DisplayClass56_0.<StartWorkerProcesses>b__0(??) 
at /_/src/WebJobs.Script/Workers/Rpc/FunctionRegistration/RpcFunctionInvocationDispatcher.cs : 229

Setting a unique (and shorter than 32 characters) hostid allowed the application to start.

This is frustrating because, as noted above, the default app names generated by Visual Studio means that apps can't be successfully deployed to any slot without making this change.

scp-mb commented 1 year ago

Pretty much the same as above. Seeing the job host started then immediately stopped again, leading to TaskCanceledException in our functions.

Finding the root cause was an absolute pain, because there is supposed to be an error logged about host id collisions, but there never was anything logged about it. It just shuts down and starts up again immediately in a loop, with short lived functions managing to complete a small amount of work before the host is killed again.

scp-mb commented 1 year ago

I'll also point out that if the function host is going to be shut down it should be done before functions start to run, instead of cancelling them mid-run

moneygit commented 1 year ago

I also need clarification on the case for host ID. AzureFunctionsWebHost__hostid or AzureFunctionsWebHost__hostId

Documentation said all lower case, but then the App services diagnostic settings alert it as a risk, changing it to hostId the alert clears.

asalvo commented 1 year ago

We are deploying a Python Application using a custom Docker Container to Azure Functions. I can confirm with absolute certainty, that AzureFunctionsWebHost__hostId (uppercase I) caused our function to fail with logs similar to this comment (except mentioning python).

As soon as we changed to AzureFunctionsWebHost__hostid (lower case i) our function app started working.

With either hostId or hostid, we get the log message "Host id explicitly set in configuration. This is not a recommended configuration and may lead to unexpected behavior." as expected. However, with hostId, we never got a error logged that there was a hostId collision. However, a few log lines later, it says Starting Host (HostId=company1234-trm-prod01-un1-fa-ft (rest of log line ommited) which is a truncated name.

With hostId we do NOT get the diagnostics error. With hostid we do get the diagnostics error. This leads me to believe that the logic for the diagnostics error is incorrect (case insensitive when it needs to be case sensitive).

graemefoster commented 1 year ago

Same issue with not being able to find the error. As a customer all I see in my logs is the function host restarting. Can we surface this error into the standard host trace output.

miqm commented 1 year ago

Same here, no error in logs, our readiness probe pinging admin/host/ping just stopped responding after ~10 minutes and that caused AKS to restart the pod. An error, on a critical level, would be nice.

graemefoster commented 1 year ago

I'll also point out that if the function host is going to be shut down it should be done before functions start to run, instead of cancelling them mid-run

100% agree - I seem to be able to get the function runtime in a real panic because of this coupled with a timer trigger. I understand that fixing the host Id is the right thing to do, but can we make this check happen before functions kick in.

paulyuk commented 1 year ago

@fabiocav could we please bump this one up the list? @eamonoreilly fyi

slampunk commented 1 year ago

spent an entire week trying to track down the root cause for this problem. Very disappointed that nothing pops up in the logs about collisions. This is a known 6 year old issue, I expect more even if it's simply a log of potential reasons for the cause.

ddaniels-andmore commented 1 year ago

This is a PITA and it's been 6 years now since the issue was identified. Any update on the ETA / Roadmap?

akirayamamoto commented 10 months ago

We just got hit by this issue. Do you have any updates?

madmahii24 commented 9 months ago

I have started facing this issue suddenly since yesterday. We have V4 function apps deployed on windows elastic plan.

I have function apps having name of length 43 characters & having ambiguity of name in the trail

like abc-defg-abcdefghijklmnopqrstuv-v1-eus & abc-defg-abcdefghijklmnopqrstuv-v2-eus

I was already having below app setting added to Function App configuration: "AzureFunctionsWebHost__hostId" with GUID generated host ID. The same configuration was working since last 6 months & since yesterday it has started failing (function apps stops & starts intermittently).

I have tried adding below app settings:

  1. AzureFunctionsWebHosthostId -> AzureFunctionsWebHosthostid
  2. Removed hyphens (-) from GUID
  3. Added AzureFunctionsWebHost:hostId (upper I)
  4. Added AzureFunctionsWebHost:hostid (lower i)
  5. Added FUNCTIONS_HOSTID_CHECK_LEVEL with value 'Warning'

Host ID collision error is removed from diagnose & solve problem screen but issue isn't resolved. Function apps still stops & starts intermittently).

The only solution that worked for me was to deploy a different function app with truncated name (as we cannot rename the function app) However this is not the solution I was expecting because I have 26 function apps & now I need to remove, rename & redeploy those again, raised a ticket with microsoft support but no response since yesterday.

jassent commented 9 months ago

I have started facing this issue suddenly since yesterday. We have V4 function apps deployed on windows elastic plan.

I have function apps having name of length 43 characters & having ambiguity of name in the trail

like abc-defg-abcdefghijklmnopqrstuv-v1-eus & abc-defg-abcdefghijklmnopqrstuv-v2-eus

I was already having below app setting added to Function App configuration: "AzureFunctionsWebHost__hostId" with GUID generated host ID. The same configuration was working since last 6 months & since yesterday it has started failing (function apps stops & starts intermittently).

I have tried adding below app settings:

  1. AzureFunctionsWebHosthostId -> AzureFunctionsWebHosthostid
  2. Removed hyphens (-) from GUID
  3. Added AzureFunctionsWebHost:hostId (upper I)
  4. Added AzureFunctionsWebHost:hostid (lower i)
  5. Added FUNCTIONS_HOSTID_CHECK_LEVEL with value 'Warning'

Host ID collision error is removed from diagnose & solve problem screen but issue isn't resolved. Function apps still stops & starts intermittently).

The only solution that worked for me was to deploy a different function app with truncated name (as we cannot rename the function app) However this is not the solution I was expecting because I have 26 function apps & now I need to remove, rename & redeploy those again, raised a ticket with microsoft support but no response since yesterday.

Any chance it is a scaling/instance issue? I.e. you actually have multiple of the same hostid in use?

marcelaction commented 7 months ago

@madmahii24 any response from microsoft support? Running into the same issue