apache / openwhisk-runtime-dotnet

Apache OpenWhisk Runtime .Net supports Apache OpenWhisk functions written in .Net languages
https://openwhisk.apache.org/
Apache License 2.0
34 stars 28 forks source link

improve cold start times #34

Closed kamyker closed 2 months ago

kamyker commented 4 years ago

You’d need to ask ibm :)

As an Apache project we’d love the contribution, which you could then build and push to dockerhub and use independent of the ibm lifecycle. Originally posted by @rabbah in https://github.com/apache/openwhisk-runtime-dotnet/issues/22#issuecomment-556197318

Didn't know it was possible but 3.1 image from docker works fine with ibm functions.

Instead of:

ibmcloud fn action update {methodName} out.zip --kind dotnet:2.2 --main {namespaceName}::{namespaceName}.{className}::{methodName} --web true

Simply run:

ibmcloud fn action update {methodName} out.zip --docker openwhisk/action-dotnet-v3.1 --main {namespaceName}::{namespaceName}.{className}::{methodName} --web true

Just letting you know :)

rabbah commented 4 years ago

That's right. You can even push your own images with your own dependencies and use them as "docker actions". The main difference is performance. The platform built-in images will load faster whereas the docker actions will have occasional slow starts because the image is pulled.

kamyker commented 4 years ago

Interesting!

Finally managed to make openwhisk work on my windows with the kubernates guide. Also made (my first :) ) docker image of 3.1.

Created simple bat to make iteration little bit faster

docker build -t kamyker/openwhisk-action-dotnet-v3.1:latest .
docker push kamyker/openwhisk-action-dotnet-v3.1:latest
pause

It takes only 20-30 sec but I guess there's a better way.

Is there any way to use local image? Something like this https://github.com/apache/openwhisk-deploy-kube#deploying-a-locally-built-docker-image but for runtime --kind 3.1

kamyker commented 4 years ago

Changing issue name and will write sometime what I've found.

This should speed up cold times: https://github.com/kamyker/openwhisk-runtime-dotnet/commit/f9a739779887b4262c987c0be8bf2b77071c651b#diff-6bc49d26f451cad6d8312deca5a0eadeL89-L96

Instead of writing zip and deleting it, we simply extract it from memory. Probably there's a way to do it directly from stream.

kamyker commented 4 years ago

After some changes my cold start compared to openwhisk/action-dotnet-v3.1 and --kind 2.2 on ibm almost halved! From ~1.45 sec To ~750ms

That's right. You can even push your own images with your own dependencies and use them as "docker actions". The main difference is performance. The platform built-in images will load faster whereas the docker actions will have occasional slow starts because the image is pulled.

Opposite :)

What's also interesting is that real response time dropped by more than 2 sec: 455eae900901de551c170ac274a3e40a 9df9f59f7fc2a4a204645a6803928baa

kamyker commented 4 years ago

Removed Newtonsoft json and used System.Text.Json that overall has less dependencies. Used few more features to strip down C# build.

Cold start of empty function ~350ms

The one in the post above (it's 6mb zip) ~570ms

That's almost 1 sec lower than --kind dotnet:3.1 and 2.2 :)

Great results, much better than Azure Functions or even AWS Lambda.

rabbah commented 3 years ago

@kamyker are you interested in opening a PR for your improvements to the .NET 3.1 runtime? Sorry we missed this, I think the speeds ups are great.

kamyker commented 3 years ago

I've switched to aws lambda due to unexpected crashes mentioned in https://github.com/apache/openwhisk-runtime-dotnet/issues/35

I'm not using ow anymore, can't provide more info, sorry :/

mattwelke commented 3 years ago

That's right. You can even push your own images with your own dependencies and use them as "docker actions". The main difference is performance. The platform built-in images will load faster whereas the docker actions will have occasional slow starts because the image is pulled.

Opposite :)

Sorry to resurrect an old issue, but I'm curious about this. Why is it the opposite? It sounds intuitive that using platform images would result in quicker cold starts because the node tasked with running the container would be more likely to have the image already.

mattwelke commented 3 years ago

This should speed up cold times: kamyker@f9a7397#diff-6bc49d26f451cad6d8312deca5a0eadeL89-L96

Instead of writing zip and deleting it, we simply extract it from memory. Probably there's a way to do it directly from stream.

@kamyker

That's a clever approach to speed it up. Do you think that contributed most to the cold start times?

Btw I'll pick this up if you aren't working on it anymore. I'm using the Node.js runtime on ICF right now and I want to switch to a compiled language runtime for better DX. I like C#, but the cold start times are all 2+ seconds for me on ICF with the 2.2 runtime right now. Plus, 2.2 is EOL. D: This would be a fun way to improve my performance in prod and learn more about OpenWhisk at the same time. Mind if I ask you questions about what you've done so far if I have any?

shawnallen85 commented 3 years ago

@mattwelke Can you try the 3.1 runtime and see if you are seeing any improvements there? I've made some changes to the runtime but I don't think I've kicked off an official release of it recently -- I'll try to get on that soon.

mattwelke commented 3 years ago

@shawnallen85 Sure I'll add the 3.1 runtime to my A/B test. I've been using nodejs:12 for a while and I added dotnet:2.2 last night. I have a setup where the URLs are chosen randomly by a client that's making about one request per hour on average. It works great for A/B testing cold starts. For example, so far, I've seen about 500ms for nodejs:12 cold starts but 2s for dotnet:2.2 cold starts.

I just deployed a third action using dotnet:3.1. I'm using IBM Cloud Functions so I had to make this a Docker action since dotnet:3.1 isn't a registered kind there:

image

This means my very first request to it was really slow. I think this is because of it having to pull the runtime image:

~ > time curl -X POST https://us-south.functions.appdomain.cloud/api/v1/web/REDACTED/default/google-analytics-privacy-proxy-dotnet-3.1
{
  "code": "REDACTED",
  "error": "There was an error processing your request."
}
real    0m22.122s
user    0m0.017s
sys 0m0.000s

(don't mind the error, this is a bad curl on purpose, I just wanted to test the time)

If I let this run for another day, the dotnet:3.1 kind will probably get quite a few requests, and I might be able to see some in the activation logs where the time is less than 22 seconds, inferring that a cold start happened to be assigned to a node that had the image already. I believe that's the info you're looking for.

mattwelke commented 3 years ago

Also, once we get an idea of the improvements to make to 3.1, we can move on to 5.0. I was able to get a 5.0 runtime working pretty easily today by basing it off the 3.1 runtime. :)

https://hub.docker.com/repository/docker/mwelke/openwhisk-action-dotnet-v5.0

image

shawnallen85 commented 3 years ago

@mattwelke We made a decision to skip the 5.0 release since it is not an LTS release. We plan on supporting 6.0 once it is released (see https://dotnet.microsoft.com/platform/support/policy for more information on Current vs LTS releases).

Once 6.0 is out, we will be deprecating the 2.2 release since it is no longer supported.

mattwelke commented 3 years ago

@shawnallen85 Ah that's fair. But wouldn't that mean the OpenWhisk project should deprecate 2.2 right now then, since 3.1 is an LTS release and we have the runtime for it?

shawnallen85 commented 3 years ago

Technically, yes. I was holding off on doing that until the next .NET release due to getting it into the OpenWhisk ecosystem is changing more than just this repository.

shawnallen85 commented 3 years ago

@mattwelke I just created a pull request to move the decompression to in memory as well as updated the Newtonsoft.Json library to the latest version. Hopefully those changes will help with the cold start times.

mattwelke commented 3 years ago

@shawnallen85 Ah ok. I agree it makes sense to wait to deprecate one version of a runtime until you release another version of the runtime.

One thing about updating Newtonsoft.Json though... I think we might have to stick to version 12. I posted details here but I decided to move the details to a comment on your PR to keep it focused.

shawnallen85 commented 3 years ago

When it comes to the referenced assembly, upgrading it hasn't caused any issues AFAIK yet.

If you look at the tests code, those are actually referencing 12.0.1, not 12.0.2 nor 13.0.1.

The issue with migrating to a different serialization library will break the existing contract (the contract being JObject args in the function definition).

We support backwards compatibility as long as functions are built as a .NET Standard library (so if you wrote it for the dotnet2.2 runtime, it should work with the .NET 3.1 runtime -- we have a basic test for this and it works, see https://github.com/apache/openwhisk-runtime-dotnet/blob/master/tests/src/test/scala/actionContainers/DotNet3_1ActionContainerTests_2_2.scala).

With wanting to support that, it wouldn't be easy to transition to a new contract, but it is certainly something we can look at, but it might require a new runtime separate from this one.

mattwelke commented 3 years ago

With wanting to support that, it wouldn't be easy to transition to a new contract, but it is certainly something we can look at, but it might require a new runtime separate from this one.

Yeah, my details are in the PR comment, but I agree if it's pursued, it would make sense to wait for the 6.0 runtime to do it. If it's successful, the same idea could be used for future versions of runtimes for languages like Java, where the same problem exists (a dependency on the https://github.com/google/gson).

When it comes to the referenced assembly, upgrading it hasn't caused any issues AFAIK yet. If you look at the tests code, those are actually referencing 12.0.1, not 12.0.2 nor 13.0.1.

Does this mean we can go ahead and upgrade to v13 of Newtonsoft.Json in the 2.2 and 3.1 runtimes right now, without breaking any actions?

shawnallen85 commented 3 years ago

On the runtimes, I believe so. My most recent PR does this and our tests (again using 12.0.1) ran as expected. But if you were to upgrade it on the function side and the runtime was a lower version, you might run into a problem (I haven't tested this out).

mattwelke commented 3 years ago

But if you were to upgrade it on the function side and the runtime was a lower version, you might run into a problem (I haven't tested this out).

I think that behavior would be fine, because it would be up to the operator to convey to their users which runtimes they support. So for example if they choose not to update to the new version of the 2.2 and 3.1 runtimes, they would just keep telling their users to use v12 of Newtonsoft, and perhaps add a warning like "even though the official docs for OpenWhisk show using v13, you must use v12 due to the current versions of the runtimes on our platform".

mattwelke commented 3 years ago

Moved my idea into a dedicated issue so we can drop the subject for now until it's time for the 6.0 runtime, and we can just focus on improving 3.1.

mattwelke commented 3 years ago

@shawnallen85 I've got some interesting results for those cold starts with 3.1 by the way:

image

There's an operation that consistently takes about 50ms that the action performs, so we can subtract 50ms from each of those activation times. What remains is the cold start time. It's too quick to be a docker pull I think, so I think what we're left with is the cold start time attributed to starting containers. And it's consistently less than 1.5s. The most recent activation is a non-cold start.

mattwelke commented 3 years ago

But if you were to upgrade it on the function side and the runtime was a lower version, you might run into a problem (I haven't tested this out).

Tested this tonight and confirmed it doesn't work. Made the following action:

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <TargetFramework>netstandard2.0</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Newtonsoft.Json" Version="13.0.1" />
  </ItemGroup>

</Project>

And deployed it to IBM Cloud's dotnet:2.2 runtime. This shows up in the logs:

{
  "error": "Could not load file or assembly 'Newtonsoft.Json, Version=13.0.0.0, Culture=neutral, PublicKeyToken=30ad4fe6b2a6aeed'. Could not find or load a specific file. (Exception from HRESULT: 0x80131621)"
}
[
  "2021-04-15T01:27:12.441085Z    stderr: at System.Signature.GetSignature(Void* pCorSig, Int32 cCorSig, RuntimeFieldHandleInternal fieldHandle, IRuntimeMethodInfo methodHandle, RuntimeType declaringType)",
  "2021-04-15T01:27:12.441121Z    stderr: at System.Reflection.RuntimeMethodInfo.get_ReturnType()",
  "2021-04-15T01:27:12.441189Z    stderr: at Apache.OpenWhisk.Runtime.Common.Init.HandleRequest(HttpContext httpContext) in /app/Apache.OpenWhisk.Runtime.Common/Init.cs:line 156",
  "2021-04-15T01:27:12.556Z       stderr: The action did not initialize or run as expected. Log data might be missing."
]
dgrove-oss commented 2 months ago

Closing as this issue was about dotnet 3.1 which is EOL.