Closed JaktensTid closed 11 months ago
Are you able to provide any more details? The error message should have included more than that
@ReubenBond here it is
warn: Orleans.Runtime.CallbackData[100157] 2023-01-25T00:00:56.871716+00:00 app[web.1]: Response did not arrive on time in 00:00:30 for message: Request [S172.17.48.38:11111:33609578 symactor/666aeb257e2b47c0ae01dab451f3e10d]->[S172.17.48.38:11111:33609578 virtualmachinespoolactor/virtualMachineActorGrain] Assembly.Orca.Actors.Actors.Interfaces.IVirtualMachinesPoolActor[(Assembly.Orca.Actors.Actors.Interfaces.IVirtualMachinesPoolActor)Assembly.Orca.Actors.Actors.VirtualMachinesPoolActor].GetFreeVM(666aeb25-7e2b-47c0-ae01-dab451f3e10d, 3a23cfa3-4216-43fd-a49f-a6cb68870da2) #53. . About to break its promise. 2023-01-25T00:00:56.872776+00:00 app[web.1]: fail: Orleans.Runtime.GrainTimer[101413] 2023-01-25T00:00:56.872780+00:00 app[web.1]: Caught and ignored exception thrown from timer callback for timer GrainTimer. TimerCallbackHandler:Assembly.Orca.Actors.Actors.SymActor->System.Threading.Tasks.Task MainLoop(System.Object) 2023-01-25T00:00:56.872794+00:00 app[web.1]: System.TimeoutException: Response did not arrive on time in 00:00:30 for message: Request [S172.17.48.38:11111:33609578 symactor/666aeb257e2b47c0ae01dab451f3e10d]->[S172.17.48.38:11111:33609578 virtualmachinespoolactor/virtualMachineActorGrain] Assembly.Orca.Actors.Actors.Interfaces.IVirtualMachinesPoolActor[(Assembly.Orca.Actors.Actors.Interfaces.IVirtualMachinesPoolActor)Assembly.Orca.Actors.Actors.VirtualMachinesPoolActor].GetFreeVM(666aeb25-7e2b-47c0-ae01-dab451f3e10d, 3a23cfa3-4216-43fd-a49f-a6cb68870da2) #53. 2023-01-25T00:00:56.872809+00:00 app[web.1]: at Orleans.Serialization.Invocation.ResponseCompletionSource
1.GetResult(Int16 token) in /_/src/Orleans.Serialization/Invocation/ResponseCompletionSource.cs:line 230
2023-01-25T00:00:56.872812+00:00 app[web.1]: at System.Threading.Tasks.ValueTask1.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state) 2023-01-25T00:00:56.872813+00:00 app[web.1]: --- End of stack trace from previous location --- 2023-01-25T00:00:56.872813+00:00 app[web.1]: at Assembly.Orca.Actors.States.SymLogicHandler.CreatingAttaching(Guid symId, ISymActor symActor) in /src/Assembly.Orca.Actors/Assembly.Orca.Actors/States/SymLogicHandler.cs:line 54 2023-01-25T00:00:56.872816+00:00 app[web.1]: at Assembly.Orca.Actors.Actors.SymActor.MainLoop(Object parameter) in /src/Assembly.Orca.Actors/Assembly.Orca.Actors/Actors/SymActor.cs:line 54 2023-01-25T00:00:56.872818+00:00 app[web.1]: at Orleans.Runtime.GrainTimer.ForwardToAsyncCallback(Object state) in /_/src/Orleans.Runtime/Timers/GrainTimer.cs:line 139
It's hard to say what is causing this. It's probable that your code is blocking on something, but it's hard to say what. I expected to see a status response to appear in the timeout message, but I'm not seeing that here. Is the system under very high load?
No, but I call Task.Factory.StartNew at some point. Can this cause the issue?
Maybe, what does your call look like?
public async Task<string> ExecuteCommand(IHostCommand command, string ipAddress)
{
var sshKey = await _sshKeyReader.GetSshKey();
var commandText = await command.GetCommandContent();
return await Task.Factory.StartNew(() =>
{
// blocking calls here
});
}
Something like this
Oh, you perform blocking calls in there. Do you pass other arguments to Task.Factory.StartNew, like TaskScheduler.Default
?
You should use Task.Run if this is what you're going to do, or consider something like @Kritner's Orleans.SyncWork
package: https://github.com/OrleansContrib/Orleans.SyncWork
No, just plan Task.Factory.StartNew Ok, I'll try. Can you keep this thread opened? I will try this a little bit later
Can you keep this thread opened?
Sure thing, let us know how it goes.
Changed Task.Factory.StartNew to Task.Run, but still it is throwing me the same error. Should the EVERY method of grain take no more than 30 sec? If so why it is working locally?
Is there a way to increase this timeout to a couple of minutes maybe?
You must never block in grain code. You can increase the ClientMessagingOptions.ResponseTimeout and SiloMessagingOptions.ResponseTimeout to increase the duration. What are you doing in these tasks, how long should they take?
Ok, so I just increased ResponseTimeout to 10 mins, since almost all of my methods in the grains will be taking a lot of time and I don't want to bloat the code with state machine pattern
Ok, so I just increased ResponseTimeout to 10 mins, since almost all of my methods in the grains will be taking a lot of time and I don't want to bloat the code with state machine pattern
I think if you just increase the timeout across the board, you'll theoretically eventually run into a point where orleans is starved and can no longer do acks.
The .NET ThreadPool itself will be becoming starved. You should move this work onto a dedicated pool.
@ReubenBond how to do this? Task.Run?
Task.Run will be better than Task.Factory.StartNew - it won't block your grain's scheduler, at least.
Both will contribute to thread pool starvation, though, so it would be ideal to isolate your work by putting it on an isolated thread pool.
Of course, you will find that at some level of load, the system stops responding in an acceptable period. I don't know what your target request rate & concurrent requests is, or how long each request takes to service, so it's hard to provide any concrete advice other than telling you what not to do.
fwiw the code at https://github.com/usnistgov/acvp-server effectively uses the package from https://github.com/OrleansContrib/Orleans.SyncWork that @ReubenBond pointed out earlier.
The package Orleans.SyncWork
came from learnings in implementing the abstraction for the acvp-server.
This project queues up thousands of blocking grain calls more or less "per request" and handles it fine w/o overloading the cluster (though careful configuration is necessary).
Don't know if this will be a possibility for you, but the package should work for (at least) the specific circumstances we needed.
Hi,
I am getting
System.TimeoutException: Response did not arrive on time in 00:00:30 for message: Request [S172.17.48.38:11111:33609578 symactor/666aeb257e2b47c0ae01dab451f3e10d]
I am using ado.net clustering as such
builder.Host.UseOrleans(silo => { silo .UseAdoNetClustering(options => { options.Invariant = "Npgsql"; options.ConnectionString = builder.Configuration.GetConnectionString("postgres"); }) .ConfigureLogging(options => options.SetMinimumLevel(LogLevel.Information)) .AddIncomingGrainCallFilter<LoggingInterceptor>() .AddAdoNetGrainStorage(StateStorageConstants.StorageName, options => { options.Invariant = "Npgsql"; options.ConnectionString = builder.Configuration.GetConnectionString("postgres"); }); });
Can someone help with this?