Closed pmatthews05 closed 3 years ago
Hi @pmatthews05,
Thanks for reaching out. Just wanted to let you know what I'll be adding this as an investigation item to our next release sprint, which should be starting in a couple of days. I'll keep this thread updated with our progress.
Hi @pmatthews05! Just attempted to reproduce this with the following code:
const df = require("durable-functions");
module.exports = async function(context, req) {
const client = df.getClient(context);
const runtimeStatus = [
df.OrchestrationRuntimeStatus.Completed,
df.OrchestrationRuntimeStatus.Running,
];
const instances = await client.getStatusBy(
new Date(2018, 3, 10, 10, 1, 0),
new Date(2021, 3, 10, 10, 23, 59),
runtimeStatus
);
instances.forEach((instance) => {
context.log(JSON.stringify(instance));
});
};
and got back and instances
array of 130 orchestrators when running on my local storage emulator. So, at least locally, I can't seem to reproduce this yet.
Just to make sure I understand, is this happening locally, on an azure storage account, or both? And would you mind trying this out locally (on an azure storage emulator), and assuming you believe you might have over 100 local orchestration histories? It would be good to confirm if this is an issue that only occurs while deployed or if it's an issue particular to your setup.
Thanks! ⚡ ⚡
Hi @davidmrdavid Thank you for looking into this.
Originally when running I was doing it all in Azure. I also was debugging on my local machine still connecting to Azure Storage.
On your request, I have now gone full local, with Azure Storage Emulator version 5.10.0.0.
I ran my code, and from looking in the Azure Table "[hostname]Instances" there were 4100 orchestration. I have exported the results. TableInstances.xlsx
I then ran code similar to what you put:
const timerTrigger: AzureFunction = async function (
context: Context,
myTimer: any
): Promise<void> {
const client = df.getClient(context);
const runtimeStatus = [
df.OrchestrationRuntimeStatus.Canceled,
df.OrchestrationRuntimeStatus.Completed,
df.OrchestrationRuntimeStatus.ContinuedAsNew,
df.OrchestrationRuntimeStatus.Failed,
df.OrchestrationRuntimeStatus.Pending,
df.OrchestrationRuntimeStatus.Running,
df.OrchestrationRuntimeStatus.Terminated,
];
const instances = await client.getStatusBy(
new Date(2018, 3, 10, 10, 1, 0),
new Date(2021, 3, 10, 10, 23, 59),
runtimeStatus
);
context.log(instances.length) // Always equals 100
lastRunTime = new Date(
Math.max(
...instances.map((instance) =>
new Date(instance.lastUpdatedTime).getTime()
)
)
);
//LastUpdatedTime: Thu Jan 28 2021 17:19:10 GMT+0000 (Greenwich Mean Time)
context.log(`Max LastUpdatedTime: ${lastRunTime}`)
context.log("InstanceId,Name,LastUpdatedTime,CreatedTime,RuntimeStatus")
instances.forEach((instance) => {
context.log(`${instance.instanceId},${instance.name},${instance.lastUpdatedTime},${instance.createdTime},${instance.runtimeStatus}`)
})
context.log('Complete....')
};
export default timerTrigger;
This only returns 100 items. If I run it again from scratch, it always returns the same 100 items. AzureFunctionRunOutput.xlsx There appears to be no order of what is returned, the CreatedTime or the LastUpdatedTime is not in an order.
After a little more investigation, it is returning the first 100 from the Azure Table, you can see this if you open both attached spreadsheets, and compare the InstanceId from AzureFunctionRunOutput.xlsx and partitionKey from TableInstances.xlsx
I'm using Windows Azure Storage Emulator 5.10.0.0.
Paul.
Thanks @pmatthews05,
Just to make sure I have the best chance at replicating this, a question and a request.
question: when you say "I ran my code, and from looking in the Azure Table "[hostname]Instances" there were 4100 orchestration. I have exported the results.". What is "my code" in this case? It's not code calling getStatusBy
, right? I suspect it's code that generates orchestrations, but I'm unsure.
request. I realize you've provided a rather clean snippet of code for a reproducer, but I did run something similar and did not get the same behavior. Just to make sure we are getting as similar of a reproducer as possible, do you mind attaching a runnable zip
link or linking me to a repo I can clone and run with a func host start
? I just want to make sure there's no hidden configuration that might be causing this difference in behavior.
Thank you!
Hi @davidmrdavid, When I said "I ran my code" - I meant I ran code that kicked off the orchestrator multiples times. It is not the code calling getStatusBy.
The code I provided in the last snippet, was the actual code I ran that only returned the first 100 statuses, although it was stripped down, it was what I used. It was a typescript Azure Node function, and I had it as a timer function which in my dev environment fired every few minutes for me to test. The actual code that creates and runs the orchestrator multiple times I'm afraid I couldn't share the code with you, as it belongs to the client, and also you would need to configure app only registrations to SharePoint and a CosmosDb.
I could probably create an dummy version of the code, that doesn't connect to anything, and runs with fake data instead of making calls and share that repo with you.
I'm really baffled how you are getting more than 100 orchestration back in one call to getStatusBy. Looking at the call that getStatusBy https://github.com/Azure/azure-functions-durable-js/blob/c6e1844a3959fc37447c2a364ea0f8833a070852/src/durableorchestrationclient.ts#L873 I am right in thinking, that the GetStatusBy call is doing a REST query to http://
On my localhost instance, if I call the REST API below it too only returns 100 results and view the network trace in the browser, it gives me a continuation token.
Hi @pmatthews05,
Thank you for clarifying and for going as far as to replicating the request on your Chrome DevTools. Now I'm starting to second-guess the length of results I saw when reproducing your issue. I'll continue investigating and provide in-progress updates as they come. Thanks!
Just a partial update, this is what I'm seeing locally: 130 results
Hi @pmatthews05, are you using extension bundles V2?
I also just hit the endpoint directly via PostMan
and got back the same 130 results 🤔
Well this is very frustrating...
Extension bundle is V2. Although I've just noticed the Extension bundle was written with a ) bracket instead of a ] at the end. I will change that and test that tomorrow. I am unable to test today.
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[2.*, 3.0.0]"
}
I could kick myself if that is the reason.
@pmatthews05, I don't think that's the reason, the )
is expected as it denotes "up-until-but-not-including" in your version range. In other words it reads: "use extension bundles in the range V2" whereas if you add the ]
it would mean "include V3.0.0 as well"
Now, taking a step back...
I'm looking at the URL endpoint you shared:
http://localhost:7071/runtime/webhooks/durabletask/instances?taskHub=...
whereas mine looks like:
so I think you and I are taking different paths in the if-else-statement where I'm taking the if-
and you're taking the else-
branch.
I know one of the two paths is an older implementation that exists only for backwards compatibility, but I'll need to ask around to figure out which that is. I'll get back to you on that ⚡ ⚡
I will check using /durabletasks/instances. Is this the same URL that is being used inside the GetByStatus call?
Would you be willing to have a MS Teams call with me at some point if we are still unable to solve it? It's the only way i can show you the code, without giving you the code.
@pmatthews05, not sure I understand your question but it is the same URL being built in the if-branch of that method. If your code is not taking that path automatically, I can't guarantee it will work if you try to hit that endpoint directly, but it can't hurt to try.
I also don't mind jumping on a call with you to help out, but let's hold on to that just a bit while I gather context on this difference in execution.
@ConnorMcMahon, do you remember when rpcBaseUrl
in this if-else statement is not undefined
? I remember something about supporting an old-path but I can't remember the details.
I thought it was the difference between extension bundles V1 versus V2 but I always got an rpcBaseUrl
value when switching bundles locally.
Hi @davidmrdavid
Firstly changing my host from "version": "[2.*, 3.0.0)" to "version": "[2.*, 3.0.0]"
didn't fix the issue.
I attempted to use the URL http://localhost:7071/durabletask/instances instead of http://localhost:7071/runtime/webhooks/durabletasks/instances and it just gave me a page can't be found message.
Hi @pmatthews05,
So I got confirmation about why some execution would construct the URL you're seeing. Apparently, you'd take it if you set localRpcEndpointEnabled
to False
in your host.json
. Is that the case? Can you see if you set localRpcEndpointEnabled
anywhere in your app? Thanks!
@davidmrdavid this is my entire host.json file.
{
"version": "2.0",
"extensions": {
"durableTask":{
"hubName": "%TaskHubName%"
},
"http": {
"maxConcurrentRequests": 50
}
},
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
}
}
},
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[2.*, 3.0.0]"
}
}
@pmatthews05
I see, I definitely don't see anything suspicious in there. Could you help me with three follow-up requests?:
(1) If you create a brand new Durable JS app (only copy-paste the JS files, let the config be automatically generated) and try to run this, do you still see that URL-format being constructed or do you see mine?
(2) Back in your original app, can you modify your host.json
to be as follows:
{
"version": "2.0",
"logging": {
"applicationInsights": {
"samplingSettings": {
"isEnabled": true,
"excludedTypes": "Request"
}
}
},
"extensions": {
"durableTask": {
"localRpcEndpointEnabled": true
}
},
"extensionBundle": {
"id": "Microsoft.Azure.Functions.ExtensionBundle",
"version": "[2.*, 3.0.0)"
}
}
And let me know if you still see this error. This should definitely set the localRpcEndpointEnabled
property to True
. Please also let me know what URL format you see constructed!
(3) If you run this on Azure, I should be able to look at our internal logs and determine if your "localRpcEnabled" value is being set or not. If you opt for this, I would need the following information:
Timeframe issue observed:
Function App name:
Function name(s):
Azure region:
Orchestration instance ID(s):
Azure storage account name:
Hi @davidmrdavid
For point 1. I have created this repo https://github.com/pmatthews05/DurableFunctionExample in the readme file is the exacts steps I went through to create the project. I did this on a clean laptop. Again when running it only brings back 100 maximum for the GetStatusBy. Hopefully from reading through my steps you can see if there is something I'm missing.
The URL is still the same, with the localRpcEndpointEnabled
{
id: "822922067ea44e1ba30661db62af3c03",
statusQueryGetUri: "http://localhost:7071/runtime/webhooks/durabletask/instances/822922067ea44e1ba30661db62af3c03?taskHub=TestHubName&connection=Storage&code=/NuM5eQn6t8kE4T7KeeXBzUTSMpChouN4BHqcGYgoAe7LVc8D39gHg==",
sendEventPostUri: "http://localhost:7071/runtime/webhooks/durabletask/instances/822922067ea44e1ba30661db62af3c03/raiseEvent/{eventName}?taskHub=TestHubName&connection=Storage&code=/NuM5eQn6t8kE4T7KeeXBzUTSMpChouN4BHqcGYgoAe7LVc8D39gHg==",
terminatePostUri: "http://localhost:7071/runtime/webhooks/durabletask/instances/822922067ea44e1ba30661db62af3c03/terminate?reason={text}&taskHub=TestHubName&connection=Storage&code=/NuM5eQn6t8kE4T7KeeXBzUTSMpChouN4BHqcGYgoAe7LVc8D39gHg==",
rewindPostUri: "http://localhost:7071/runtime/webhooks/durabletask/instances/822922067ea44e1ba30661db62af3c03/rewind?reason={text}&taskHub=TestHubName&connection=Storage&code=/NuM5eQn6t8kE4T7KeeXBzUTSMpChouN4BHqcGYgoAe7LVc8D39gHg==",
purgeHistoryDeleteUri: "http://localhost:7071/runtime/webhooks/durabletask/instances/822922067ea44e1ba30661db62af3c03?taskHub=TestHubName&connection=Storage&code=/NuM5eQn6t8kE4T7KeeXBzUTSMpChouN4BHqcGYgoAe7LVc8D39gHg=="
}
For point 2. I set the localRpcEndpointEnabled to true, I also did this in the project above, and it made no different to the URL. Also the word localRpcEndpointEnabled is highlighed in yellow in my environment saying it's not allowed.
{
"id": "8dc47c3340d9492c93e0af3e1dafb6c6",
"statusQueryGetUri": "https://func-beisdevpme5-azureadmin-00.azurewebsites.net/runtime/webhooks/durabletask/instances/8dc47c3340d9492c93e0af3e1dafb6c6?taskHub=azureadmin&connection=Storage&code=xxx",
"sendEventPostUri": "https://func-beisdevpme5-azureadmin-00.azurewebsites.net/runtime/webhooks/durabletask/instances/8dc47c3340d9492c93e0af3e1dafb6c6/raiseEvent/{eventName}?taskHub=azureadmin&connection=Storage&code=xxx",
"terminatePostUri": "https://func-beisdevpme5-azureadmin-00.azurewebsites.net/runtime/webhooks/durabletask/instances/8dc47c3340d9492c93e0af3e1dafb6c6/terminate?reason={text}&taskHub=azureadmin&connection=Storage&code=xxx",
"rewindPostUri": "https://func-beisdevpme5-azureadmin-00.azurewebsites.net/runtime/webhooks/durabletask/instances/8dc47c3340d9492c93e0af3e1dafb6c6/rewind?reason={text}&taskHub=azureadmin&connection=Storage&code=xxx",
"purgeHistoryDeleteUri": "https://func-beisdevpme5-azureadmin-00.azurewebsites.net/runtime/webhooks/durabletask/instances/8dc47c3340d9492c93e0af3e1dafb6c6?taskHub=azureadmin&connection=Storage&code=xxx"
}
For Point 3 This is running the original code, not the code I've provided in the repo above.
Hi @pmatthews05 ,
I'll be taking a look at your logs to find your durableTask-options object (which you configured in your host.json
). In the meantime, could you please confirm that when you say "the URL is the same", we're talking about the URL that is used in the axios request and not the URLs returned by starting the orchestration? I worried there might a misunderstanding since you posted the latter. Thanks!
Also, is the timeframe you provided around the time the app was created? We emit your options object close to App-initialization time and so far I haven't been able to find it. If not, could you please give me an estimate of when your app was created?
Nevermind, I just found it.
I'm reading "localRpcEndpointEnabled: null
" on your Azure deployment, which according to this is the default value which is what we would want.
I'll check with the rest of the durable team to see if any of them is able to reproducer your observations. I'll get back to you.
Hi @pmatthews05 , I'll be taking a look at your logs to find your durableTask-options object (which you configured in your
host.json
). In the meantime, could you please confirm that when you say "the URL is the same", we're talking about the URL that is used in the axios request and not the URLs returned by starting the orchestration? I worried there might a misunderstanding since you posted the latter. Thanks!
When I say the URL is the same, I mean same for me, that it begins with "http://localhost:7071/runtime/webhooks/durabletask/instances/". Further research shows that the URL matches Microsoft Documentation for version 2 of the Function runtime. https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-http-api#request
Hi @pmatthews05,
Just to clarify some confusion: The URL in the docs you linked is for starting a new orchestration via an HTTP client, which is different from getting the status of all orchestrations during some time-range. So unfortunately, that URL prefix is not expected for getStatusBy
when localRpcEnabled
is true
, which should be the new default and appears to be on my local machine.
I'm setting up a few reproducers over different environments just as a sanity check, will get back to you as soon as possible.
@davidmrdavid Sorry the bookmark link was slightly wrong. https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-http-api#request-2
Get All Instance status.
GET /runtime/webhooks/durableTask/instances?
taskHub={taskHub}
&connection={connectionName}
&code={systemKey}
&createdTimeFrom={timestamp}
&createdTimeTo={timestamp}
&runtimeStatus={runtimeStatus1,runtimeStatus2,...}
&showInput=[true|false]
&top={integer}
Did you try the repo I provided? https://github.com/pmatthews05/DurableFunctionExample
Yeah I did try it without success, at first, but somehow my system is capping the return value at 100 results now, even with an old reproducer. It's strange, to say the least, as you've seen my screenshots of return values that returned ~130 results. Maybe this is all due to a platform update that we received at different times? Just a theory...
Regardless, with a reproducer at hand, I can now work to unblock this for you. Working on a patch as we speak! ⚡ ⚡
Hi @pmatthews05,
I just released durable-functions 1.4.6 which should include a patch for this problem. Could you please give it a try? Thanks! ⚡ ⚡
Hi @davidmrdavid, Thank you for this. I have tested it this morning with my demo project locally and can confirm I'm bringing back more than 100 items. I will test this further today with a deployment and test within Azure. Once I can confirm this is working too, I will then close this ticket.
Regards
Paul
@davidmrdavid Have been able to test this in Azure instead of using local emulators. It was very quick and returned way more than 100. Thank you for your work on this.
Describe the bug When calling client.getStatusBy the maximum number of results it calls back is 100 items. There doesn't seem to be a way to page in the node code.
Investigative information
To Reproduce Steps to reproduce the behavior: https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-instance-management?tabs=javascript#query-instances-with-filters
Following Microsoft example above, whenthe code is run against a large environment that has many orchestrations, only 100 items come back. Looking at the node code /src/durableorchestrationclient.ts it seems that you are using the Rest API call to get the statuses within the function getStatusInternal.
According to the documentation of the REST API,
https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-http-api#response-2
Cannot see any headers captured or used in the code.
Expected behavior Either the x-ms-continuation-token is returned when you make a call, and the getStatusBy can have the x-ms-continuation-token passed in so that the developer can perform the paging themselves.
Or internally in the getStatusInternal you handle the paging and keep calling until you have all the results.
Actual behavior Only a maximum of 100 items are being returned.
Known workarounds (unsuccessful)
We tried the below workaround, and although it does bring back lots more results, it doesn't seem to be capturing all orchestration statuses. We are not convinced that the results are being returned in the lastUpdatedTime order of earliest first.
Additional context Using Visual Studio Code.