dotnet / orleans

Cloud Native application framework for .NET
https://docs.microsoft.com/dotnet/orleans
MIT License
10.11k stars 2.04k forks source link

Orleans client fails to load contract dll randomly #9214

Open mastoj opened 1 week ago

mastoj commented 1 week ago

I have a small example/POC I'm working on where I want to demonstrate how orleans can make life easier and also more robust.

If you want to go straight to the code it is here: https://github.com/mastoj/monostore/tree/random-error

The setup is that I have an API as a Orleans Client, and then I have two different workers as silos, one for cart and one for product. I also have an API project for cart and product that the main API project references to keep the cart api definitions close to the rest of the cart implementation. The cart/product API as then referencing their own contract folder which defines the contracts for the API and grains.

So basically I have the below for cart (same for product):

   API (Client) -> Cart API -> Contract
   Cart Worker (Silo) -> Contract

When I make a call to the API to create a cart, https://github.com/mastoj/monostore/blob/a9abe53968a5ee17acc929c86a36ea29e8fc7cfd/src/cart/requests/requests.http#L5, it fails randomly with the exception

System.TypeLoadException: Unable to load MonoStore.Cart.Contracts.Grains.ICartGrain,MonoStore.Cart.Contracts from assembly MonoStore.Cart.Contracts ---> System.IO.FileNotFoundException: Could not load file or assembly 'MonoStore.Cart.Contracts, Culture=neutral, PublicKeyToken=null'. The system cannot find the file specified.

The exception is thrown in the API project, so the request never reaches the silo.

Everything is set up with Aspire, but I don't think that should impact how dlls are loaded.

ReubenBond commented 1 week ago

I can reproduce this. Thank you for putting it together. This is a limitation of heterogenous clusters currently. The workaround is to add all contract assemblies to all silos (all gateways, which is all silos in this case).

The limitation is at the RPC layer. I have a branch to fix this, but it's not in a mergeable state just yet. After adding the Cart contract reference to the Product service, the request completes successfully: Image

The reason that it works sometimes even without this is that the client might send the request to a compatible gateway. cc @benjaminpetit: we could change client routing to pick compatible gateways while we prepare the true fix.

mastoj commented 1 week ago

Thanks for quick response.

To clarify, I only need to reference the contract, not the grain implementation?

ReubenBond commented 1 week ago

Yes, that's correct

mastoj commented 1 week ago

Had some issues with the orleansdashboard, my guess is that it is related.

mastoj commented 1 week ago

I actually have the issue from time to time even after adding the references. I have added a reference to all contracts projects to all my silos and the api. Some times it does work, then all of a sudden it fails. The changes can be seen in this PR: https://github.com/mastoj/monostore/pull/1/files

mastoj commented 1 week ago

@ReubenBond , do you maybe know why I still see it. I find it very confusing because after starting up the cluster it can fail for a couple of requests, but when changing the id of the cart I try to create a couple of times it start working.