dotnet / orleans

Cloud Native application framework for .NET
https://docs.microsoft.com/dotnet/orleans
MIT License
10.06k stars 2.03k forks source link

Thoughts or experiences to share about running Orleans in a separate process other than the entrypoint host? #252

Closed veikkoeeva closed 9 years ago

veikkoeeva commented 9 years ago

As the heading says, some more specific questions scattered in the text.

I was thinking about a situation in which Orleans is running on a server, say loaded in a Windows Service, and then the silo crashes. I suppose not the silo nor its host can protect the silo from crashing due to something that happens in the grains and certainly not due to something that happens outside of the silo. One reason I can think is a grain receiving input that exhausts all the reservable memory (it looks like the silos won't operate on fixed set of memory). The Orleans watchdog detects a silo has crashed, but it can't do much to the host process in another machine. Does anyone have experience of Orleans host process crash situations?

IIS handles this by loading a process per application pool and upon crash restarting the application pool (this means also the healthy applications in the pool will be crashed and restarted). I took a look at AsynchAgent and the OrleansRuntime classes and I don't see there's much done, or can even be done, if the Orleans silo/host process crashes.

I was wondering that what if I loaded the Orleans application domain to a separate process, perhaps even use the infamous/notorious System.Addin/MAF functionality. Does anyone have experience using System.Addin and processes to host Orleans silos? I didn't see, but I wonder, if there is or if it's a good idea, to put aside a few bytes of extra memory to log a message to stderr upon imminent crash if the reason is something recognizable such as OutOfMemoryException. Then the host process could be a very thin shell, say, a Windows Service that pings the Orleans silo host in the process and if it doesn't get a reply, restarts the process. As just toying with this idea and hope to save some time, is there a function one can use to ping the SiloHost? Perhaps getting either true or false from SiloHost.IsStarted?

yevhen commented 9 years ago

You can check silo's status by using Orleans Management Grain.

veikkoeeva commented 9 years ago

@yevhen I see! You refer to ManagementGrain here which could be called like var mg = Orleans.Runtime.ManagementGrainFactory.GetGrain(Management.SYSTEM_MANAGEMENT_ID) or as instructed at OrleansManager.exe instructions.

I'm currently poking around the system management aspects and I was wondering if it were useful to have a system error stream, now that there are streams, or if it were better to check the persistent store periodically. The OrleansManager instructions seem to be very Azure centric, should they work on-premises? Considering on-premises, or non-PaaS in general, I may want to log a lot of data to EventLog, or, in general, plug in something like Serilog – and how to do that..?

In any event, this stderr idea is a last-ditch effort to have a fixed space and a rather fixed message to put something on the disk if everything else has failed. Then if I'd trust the platform will restart Windows Services if they fail, then the Windows Service could detect the silo process it has started has gone down and restart it. But can it be done before the Orleans system notices a silo has gone down? Naturally it'd be nice if the system could be made resilient and monitored with "standard event tools" that are watching, say, Windows Event Log and the personnel (service providers and their subcontractors) are instructed to automatically monitor some catastrophic system events in the event log or when integrating with other system, say, a batch run to SAP didn't finish on time.

I suppose I'd like to see some more documentation on management and running this system.

As an aside, I see ManagementGrain on line 187 creates a StringWriter which isn't disposed or closed. Maybe I supply a PR today to wrap it into a using clause (XmlDocument doesn't close it).

sergeybykov commented 9 years ago

@veikkoeeva The pattern we've recommended and that has been employed by many people is to start a silo process (using OrleansHost.exe or a custom version of it) and wait on the process handle. If the silo process crashes or exits, the wait will be over. Would this not work for your case?

veikkoeeva commented 9 years ago

@sergeybykov Ah, thanks! In fact it does.

There was a some tangentials already, so I tuck in some more... Looking at this more closely, I see there's WindowsServerHost that can be used. It looks like both HelloWorld and HelloWorldNuget would benefit of it as it nearly almost identical.

As a tangential when looking at the new pragma in HelloWorld Main (much appreciated, I'll make use of this information), it feels like there would be need for programmatic configuration and perhaps with dependency injection too. But this has been discussed rather extensively elsewhere too. I had to mention as this feels so much being in need for "composition", like have an API to provide a few Fun objects to return the necessary configuration values upon calling. :)

gabikliot commented 9 years ago

@veikkoeeva, programmatic configuration is very important to us. You can see the evidence in the latest serious of PRs we accepted and did ourselves around programmatic configuration of providers (#237, #242, #243). So any PR in that area, even if not polished but outlines the main ideas, is very welcomed!

veikkoeeva commented 9 years ago

@gabikliot Thanks, those are good points. I think I've got the information I need. I think I have one more question, but I need to check the docs first.