HangfireIO / Hangfire

An easy way to perform background job processing in .NET and .NET Core applications. No Windows Service or separate process required
https://www.hangfire.io
Other
9.45k stars 1.71k forks source link

Running a job from a different AppDomain #753

Open KeithBarrows opened 8 years ago

KeithBarrows commented 8 years ago

The approach we are taking is to treat job solutions as separate from the server (think Plug-Ins). Let's see if I can explain:

The server is a self-hosted OWIN app within a Windows Service project using a RESTful controller to take job requests (JSON), and hosts Hangfire. When the controller gets a good POST, the app spins up a new AppDomain, loads the target job (resides in a directory underneath the main app) then queues it to hangfire.

When hangfire tries to run the job it is crashing as it cannot now find the references it was compiled with. I am assuming this is because hangfire is running in the AppDomain of the main app and does not know anything about the (temp) AppDomain the Job was initialized in.

Is there a way to run the job itself in its own AppDomain inside of hangfire?

Maybe something like this in another (overloaded) Enqueue() method?

    public class BaseHandler : MarshalByRefObject
    {
        protected internal AppDomain JobAppDomain;
        protected internal BaseHandler JobHandler;

        protected internal void SetupInstance(string jobClassName, string jobName)
        {
            var ads = new AppDomainSetup
            {
                ApplicationBase = new FileInfo(Assembly.GetExecutingAssembly().Location).DirectoryName,
                DisallowBindingRedirects = false,
                DisallowCodeDownload = true,
                PrivateBinPath = jobClassName,
                ApplicationName = jobName,
            };
            JobAppDomain = AppDomain.CreateDomain(jobName, null, ads);
            JobHandler = (BaseHandler)JobAppDomain.CreateInstanceAndUnwrap(typeof(BaseHandler).Assembly.FullName, typeof(BaseHandler).FullName);
        }

        protected internal IJob GetJob(string jobClassName)
        {
            var assembly = Assembly.LoadFrom(jobClassName + @"\" + jobClassName + ".dll");
            var assemblyType = assembly.GetType(info.AssemblyName);
            return Activator.CreateInstance(assemblyType) as IJob;
        }
    }

NOTE: IJob is an interface in our grand scheme!

We use the Base Class in our handlers like:

    public class JobAdHocHandler : BaseHandler, IJobHandler
    {
        public MinimumResultModel Handle(MinimumCommandModel message)
        {
            var result = new MinimumResultModel {Id = "-1", PayloadAsString = message.FullPayloadString};
            try
            {
                var info = message.MinimumPayload.JobInfo;

                SetupInstance(info.JobClassName, info.JobName);
                var job = JobHandler.GetJob(info.JobClassName);

                result.Id = BackgroundJob.Enqueue(() => job.Execute(null, message.FullPayloadString, JobCancellationToken.Null));
            }
            catch (Exception ex)
            {
                Log.Logger.Fatal(ex, ex.Message);
                result.Exception = ex;
            }

            AppDomain.Unload(JobAppDomain);
            return result;
        }
        public bool AppliesTo(JobType jobType) => jobType == JobType.AdHoc;
    }

Thoughts?

EProd-Rhansen commented 7 years ago

Phew, this one is a doozy for sure. The problems you will be facing here are multi-faceted.

First, you're going to need to deal with the differences between job scheduling and job execution. Whenever you call "Enqueue" the job is still technically scheduled with an execution time of "right this instant." A separate polling process will then fire up, grab the job from the queues that the server has been configured with, and execute the job accordingly. I suspect that this is where you are currently having a problem.

If you really want to operate with this architecture (which I personally advise against,) then you're going to need to take great care to ensure that you have isolated your queues to the app domains that will be scheduling and executing these jobs. This literally rules out the use of the "default" queue universally as every single BackgroundJobServer will try to pull them in by-design (this behavior cannot be overriden.)

From there, you will need to load the job's application domain with the BackgroundJobServer that will process the queue the job has been assigned too. The nastiest bit of all is that you will then need to come up with a synchronization mechanism that will allow the job's app domain to signal back to the main app domain indicating that the job has completed execution. The purpose of this is to allow you to unload the job app domain once it is no longer needed.

Additonally, the job's app domain needs to be loaded with all of the Hangfire assemblies (which does not appear to be the case in the example to provided.) As it stands now, I do believe that you will take a performace hit as the Hangfire assemblies will be loaded as domain-neutral by default (I think.) This means that every call to a static member (and there are many) will need to be routed to an intermediary that knows how to run that static code against your job-thread's context.

To tie a bow on this dissertation, you will not be able to do any of this if you intend to use BackgroundJob.Schedule or any type of RecurringJob. The reason for this is that all jobs are stuck into the "default" queue during scheduling when either of these paths are used. There is literally no way for you to achieve the level of queue isolation that would be required for a solid implementation along the lines that you were thinking. However; if you're dead set on moving down this path, then you can pull in my Pull Request as I have work-in-progress that will take care of the problems you will face when working with shceduled jobs.

KeithBarrows commented 7 years ago

What would be the suggested way to handle multiple jobs (using all enqueueing types) then? Especially when a new job could be written each month?

The main pain we are trying to "get around" is the 2-3 month life-cycle of getting anything into production! We figured, once the server is in production, as long as we don't have to reinstall it every time we have a new job to add, then all we need to concentrate on is the new job. Otherwise, we need to retest EVERYTHING whenever a new job is slated for production.

EProd-Rhansen commented 7 years ago

It sounds like you are trying to lay your own distributed processing model down on top of the one that already ships with Hangfire, which is going to give you all sorts of fits. I am not sure how far down this path you are already; but if at all possible, I would suggest changing your architecture to simply use what Hangfire provides to solve this problem.

Presently, Hangfire gives you the ability to schedule (or Enqueue) jobs to any queue that you desire. For the method you are using, then just adding the "Queue" attribute to your job method or class will do the trick. You can then configure BackgroundJobServers on different servers (or services on the same server, depending on how distributed you want to get) to only handle jobs that are sent to that specific queue.

This gives you the isolation that you're looking for at any level that you decide. You can get incredibly low level and have a different server running to support each job, or you can group your jobs into logical units and stand up servers accordingly. The scheduling can happen from anywhere so long as a reference to the assembly containing the job type is taken.

If fine-grained isolation is what you want, then a new service to support a new queue could theoretically be spun up rather quickly by using a configurable code base that simply tells Hangfire to work against a queue name that you could store in a local config file and retrieve whenever the application fires up.

To be honest, and at the risk of coming across as snide, it really seems like you have some more fundamental problems to address in the way that your project is structured. Specifically; if your code base is so large and brittle that the thought of mucking with even the tiniest part of it scares you (or creates a lot of manual regression testing,) then that tells me that things like unit and integration testing are not happening.

Ideally, you should always be able to make changes to your code base and only concern yourself with those changes that are required to implement your latest feature(s). By having test coverage of prior work, your CI process would handle the regression analysis for you and let you know if you've busted something unknowingly.

I can't really think of a reason to want to push the call to Hangfire's Enqueue off to a web service unless the prevailing wisdom is that doing so is simply required to get the actual processing of the job off of the consumer of the web service itself, when the only thing that you need to do is call Hangfire locally (where you would call the web service from in your current architecture) while providing a queue that a server elsewhere in your infrastructure has been configured to watch.

KeithBarrows commented 7 years ago

You are correct in some assumptions. CI is in its infancy here. i.e.: It is still in an alpha phase for 1 project only! There are no unit tests. There are no regression tests. We use a monolithic, mostly manual process to move code from dev to QA to prod. The policy here is if one line of code changes then it has to be manually regressed in 2 environments before being signed off for prod.

The architecture, in a nutshell: A few dozen apps that were built to support specific engineering requests. Can be parts (bill of material), invoicing (finance), or a number of other small apps built to extend huge COTS apps. Each of these support apps could have 0-N support jobs associated with them. The current architecture is to make a web service call with some info then have the current Queue server (home grown) try to run the job. About 30% crash and never complete.

Trying to stand up a new Queue server without having to rewrite all the supporting apps. Some have not been touched in 12 years! For those older apps that call a SOAP endpoint we will be writing a new SOAP handler that basically calls to the new REST service. (Oh yes, let us not forget all the VLANs entrenched here!)

So, maybe my understanding of HOW hangfire can be used is lacking. I've started from zero, followed the documents and have had to extrapolate from there. My understanding is hangfire is meant to be run from within an app, not as a standalone app, which is what we need. Most of our jobs are not complex, though a few are highly regressive (database-wise) and can take HOURS to run to completion. We have about 20% of the jobs that are cron based. About 79% that are Ad Hoc. And a possible 1% that are something else though those are looking more and more like an ad hoc solution as well.

No, we are not too far down the line yet. 2 sample jobs so far. Our needs: Be able to write a stand alone "job", following a pattern (IJob interface) then a way to execute that job. I could make each job an EXE but I don't see how to get hangfire to run them. I could drop hangfire and use a message queue (MSMQ, RabbitMQ) to handle the job queueing. Then I would need to build something in the middle for job monitoring, requeuing, cancelling, instrumentation, et al. Or, I could use the Windows OS Job System for the cron jobs and handle the ad hoc in a separate thread in the web app. However, if the end user shuts down the app, there goes the job.

[image: --] Keith Barrows [image: https://]about.me/kbarrows https://about.me/kbarrows?promo=email_sig The truly free individual is free only to the extent of his own self-mastery. While those who will not govern themselves are condemned to find masters to govern over them. -- Socrates

On Mon, Dec 5, 2016 at 3:07 PM, EProd-Rhansen notifications@github.com wrote:

It sounds like you are trying to lay your own distributed processing model down on top of the one that already ships with Hangfire, which is going to give you all sorts of fits. I am not sure how far down this path you are already; but if at all possible, I would suggest changing your architecture to simply use what Hangfire provides to solve this problem.

Presently, Hangfire gives you the ability to schedule (or Enqueue) jobs to any queue that you desire. For the method you are using, then just adding the "Queue" attribute to your job method or class will do the trick. You can then configure BackgroundJobServers on different servers (or services on the same server, depending on how distributed you want to get) to only handle jobs that are sent to that specific queue.

This gives you the isolation that you're looking for at any level that you decide. You can get incredibly low level and have a different server running to support each job, or you can group your jobs into logical units and stand up servers accordingly. The scheduling can happen from anywhere so long as a reference to the assembly containing the job type is taken.

If fine-grained isolation is what you want, then a new service to support a new queue could theoretically be spun up rather quickly by using a configurable code base that simply tells Hangfire to work against a queue name that you could store in a local config file and retrieve whenever the application fires up.

To be honest, and at the risk of coming across as snide, it really seems like you have some more fundamental problems to address in the way that your project is structured. Specifically; if your code base is so large and brittle that the thought of mucking with even the tiniest part of it scares you (or creates a lot of manual regression testing,) then that tells me that things like unit and integration testing are not happening.

Ideally, you should always be able to make changes to your code base and only concern yourself with those changes that are required to implement your latest feature(s). By having test coverage of prior work, your CI process would handle the regression analysis for you and let you know if you've busted something unknowingly.

I can't really think of a reason to want to push the call to Hangfire's Enqueue off to a web service unless the prevailing wisdom is that doing so is simply required to get the actual processing of the job off of the consumer of the web service itself, when the only thing that you need to do is call Hangfire locally (where you would call the web service from in your current architecture) while providing a queue that a server elsewhere in your infrastructure has been configured to watch.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/HangfireIO/Hangfire/issues/753#issuecomment-264993329, or mute the thread https://github.com/notifications/unsubscribe-auth/ADdlVqqwavE60V2gorV2SeR1ZxeZ7oa_ks5rFIsggaJpZM4LB733 .

EProd-Rhansen commented 7 years ago

In the spirit of trying to steer you away from spinning up your own app domains =)

It sounds like moving towards some sort of event messaging architecture fits the bill for what need to do. Have you considered using your web service to send events to more isolated services whenever these client applications make a request to it?

If you do that, then you can build services that are tailored to each of your support applications that you've mentioned. These services would maintain the references to the types that encapsulate the job logic. They would simply subscribe to your event queue, and react to an event that tells it which job to run.

Ex: You would build an InvoiceService that contained all of the jobs that supported the invoicing domain. Whenever it starts up, it would subscribe to all of the events that correlate to an execution of one of its' jobs.

Then, your invoicing support app could make a request to the web service to queue up an execution of the job that computes the interest charge on the unpaid balance of all of your outstanding invoices. The web service publishes an "ExecuteUnpaidBalanceInterestChargeJob" event as a result, which the InvoiceService reacts to by Enqueuing the job with Hangfire.

However, there really isn't a need to keep Hangfire as you are only ever using "Enqueue" and are already implementing a pattern for all of your jobs via your use of an IJob interface. This would allow you to simply give the support services knowledge of how to execute an IJob.

The only reason I would personally stick with Hangfire is if you want to take advantage of the other features it supports, like recurring jobs or task continuation. If you do end up running jobs on a schedule, then stick with Hangfire and have the support services responsible for maintaining those recurring jobs and their execution schedules. The caveat is that I don't see a need for an IJob interface if you are using Hangfire, so I would remove that layer of complexity if going this route.

If you wanted to get really fancy (which I again advise against,) then you can throw all of your job class files into a "Jobs" folder within the support services executing directory. Rather than compiling these classes into your service assembly, you would using a configuration file that tells you what type(s) are needed to run the job associated with an event it just received. You could then load and instantiate the job type using the reflection and execute it with Hangfire without ever having a direct reference to it - which would cut down on your regression testing when new jobs are rolled. Please, PLEASE only do this if unit testing and CI are a pipe dream. The developer that follows you is not going to like supporting this, and neither will you in the long run...

KeithBarrows commented 7 years ago

I like the sound of the Service approach. If I am understanding it right, each service would host it's own hangfire server, include the jobs for that service and then use a web payload (JSON) to start the work? If I took it a few steps further it almost sounds like an SOA approach using micro-services?

Another thought on your final (ugly) solution; ILMerge might be a way to go. Though, as you said, no fun to maintain. Fortunately, once the job is in production, the maintenance is extremely minimal. Which also has the drawback of the developer having to "relearn" the architecture if they should have to maintain a job.

Would like to get a little more in depth idea of your services thought!

-kb

[image: --] Keith Barrows [image: https://]about.me/kbarrows https://about.me/kbarrows?promo=email_sig The truly free individual is free only to the extent of his own self-mastery. While those who will not govern themselves are condemned to find masters to govern over them. -- Socrates

On Tue, Dec 6, 2016 at 8:57 AM, EProd-Rhansen notifications@github.com wrote:

In the spirit of trying to steer you away from spinning up your own app domains =)

It sounds like moving towards some sort of event messaging architecture fits the bill for what need to do. Have you considered using your web service to send events to more isolated services whenever these client applications make a request to it?

If you do that, then you can build services that are tailored to each of your support applications that you've mentioned. These services would maintain the references to the types that encapsulate the job logic. They would simply subscribe to your event queue, and react to an event that tells it which job to run.

Ex: You would build an InvoiceService that contained all of the jobs that supported the invoicing domain. Whenever it starts up, it would subscribe to all of the events that correlate to an execution of one of its' jobs.

Then, your invoicing support app could make a request to the web service to queue up an execution of the job that computes the interest charge on the unpaid balance of all of your outstanding invoices. The web service publishes an "ExecuteUnpaidBalanceInterestChargeJob" event as a result, which the InvoiceService reacts to by Enqueuing the job with Hangfire.

However, there really isn't a need to keep Hangfire as you are only ever using "Enqueue" and are already implementing a pattern for all of your jobs via your use of an IJob interface. This would allow you to simply give the support services knowledge of how to execute an IJob.

The only reason I would personally stick with Hangfire is if you want to take advantage of the other features it supports, like recurring jobs or task continuation. If you do end up running jobs on a schedule, then stick with Hangfire and have the support services responsible for maintaining those recurring jobs and their execution schedules. The caveat is that I don't see a need for an IJob interface if you are using Hangfire, so I would remove that layer of complexity if going this route.

If you wanted to get really fancy (which I again advise against,) then you can throw all of your job class files into a "Jobs" folder within the support services executing directory. Rather than compiling these classes into your service assembly, you would using a configuration file that tells you what type(s) are needed to run the job associated with an event it just received. You could then load and instantiate the job type using the reflection and execute it with Hangfire without ever having a direct reference to it - which would cut down on your regression testing when new jobs are rolled. Please, PLEASE only do this if unit testing and CI are a pipe dream. The developer that follows you is not going to like supporting this, and neither will you in the long run...

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/HangfireIO/Hangfire/issues/753#issuecomment-265187852, or mute the thread https://github.com/notifications/unsubscribe-auth/ADdlVk-2GuJDMvuSm0GtMqjTYCWSLGU0ks5rFYXygaJpZM4LB733 .

EProd-Rhansen commented 7 years ago

Yes'sir, I tend to avoid buzz words like "micro-services" because it has a nasty tendency to stir up silly semantic debates, but that is exactly the pattern that I am describing. It sounds like you already know the landscape of what domains these jobs will land in, so it should not be too much of a problem to carve out narrow services whose only responsibility is to run those jobs when the events come through.

This pattern gives you a lot of options that cover your current needs as well as your potential needs in the future. Not only can these services react to events and handle ad-hoc job execution, but they can also have functionality that fires on start-up and schedules routine job execution using the RecurringJob.AddOrUpdate features.

Maybe you need a hybrid approach where a job should run on a schedule, but also needs to be executed ad-hoc when a certain event occurs? That's no problem with Hangfire. Simply schedule it like you normally would via RecurringJob, and then use RecurringJob.Trigger whenever the ad-hoc event comes through.

Additionally, this no longer constrains your job modeling. You can write and chain jobs together in whatever way works for each implementation. No more trying to shoe-horn every job into a model that maybe just doesn't make sense in that isolated use-case, as Hangfire fully supports continuations and executes jobs using delegates. This even gives you the ability to pass data to the jobs whenever they run in a way that is less complicated than implementing a common interface.

If you really want all of the bells-and-whistles, then pull in one of the Hangfire IOC projects and activate these jobs using dependency injection. The rest of your systems may not run with an IOC container, but these services should be small enough that implementing one is not going to be hard. You'll be shocked at how little code you actually have to support once you're done.

EProd-Rhansen commented 7 years ago

To actually answer your question - yes, each service would make it's own call to UserHangfireServer and provide a BackgroundJobServerOptions object to the call that told it what Hangfire queues it needs to pull from. Then, each of your jobs would need to be decorated with the Queue attribute that tells it what Hangfire queue it belongs to. From there, simply calling Enqueue whenever the event is published will result in the job being executing within that service - done and done. Additionally, you can create a completely redundant solution by throwing these services into a farm and configuring a fancy service bus that can handle message retries if it cannot reach an instance of a service.

Full disclosure: If you end up using recurring jobs, then you are going to notice a lot of "JobLoadException" errors being raised by servers trying to load jobs that belong to other services. This is do to somewhat of a bug in Hangfire where the queues are honored for job execution, but not for job scheduling. The result of this is that servers will pull jobs from the Hangfire database and try to resolve the type before it schedules the job, which results in a failure that the server recovers from and continues scheduling other jobs where it does have a reference to the type.

This bug in no-way prevents your jobs from being scheduled and running on-time. It just creates a lot of unneccesary noise that is going to drive your support personnel crazy if they have a log table that they are responsible for monitoring.

Now for the self-serving poriton of my assistance today: pull request #755 is one that I wrote this past week to address this particular problem as well as a few others that I noticed along the way. If it is accepted, then it will: completely rectify these bad error notifications, give you the ability to data-drive the queues that jobs are sent too, as well as address performance problems related to how the schedulers implement their distributed locking scheme. Just something to keep in mind moving forward.

KeithBarrows commented 7 years ago

I am looking forward to your fix getting accepted!

Making a lot more sense now. We have a single server where the jobs are run. Makes sense to install various versions (services) for each major vertical we are supporting. Would also help with the DB connections as well.

On a side note, some of our jobs depend on outside web services. This is not bad unto itself but a good portion of these jobs need to run consecutively only. i.e.: If Job A has started, another Job A cannot be started until the first Job A is totally finished. Since some of these web calls can take anywhere from a few minutes to a few hours to run. I've jury-rigged this scenario by resetting the current job as a delayed job giving it a time span twice what is normally needed. In small tests this has worked but as we move to production I am guessing this may not work as expected. There does not seem to be a mechanism to label jobs as first in queue, second in queue, etc. While decorating with [DisableConcurrentExecution(30)] is the only way to handle this at the moment, it does not always seem to catch if the 2nd job is submitted right before the 1st job is resubmitted as a delayed job.

-kb

[image: --] Keith Barrows [image: https://]about.me/kbarrows https://about.me/kbarrows?promo=email_sig The truly free individual is free only to the extent of his own self-mastery. While those who will not govern themselves are condemned to find masters to govern over them. -- Socrates

On Tue, Dec 6, 2016 at 10:04 AM, EProd-Rhansen notifications@github.com wrote:

To actually answer your question - yes, each service would make it's own call to UserHangfireServer and provide a BackgroundJobServerOptions object to the call that told it what Hangfire queues it needs to pull from. Then, each of your jobs would need to be decorated with the Queue attribute that tells it what Hangfire queue it belongs to. From there, simply calling Enqueue whenever the event is published will result in the job being executing within that service - done and done. Additionally, you can create a completely redundant solution by throwing these services into a farm and configuring a fancy service bus that can handle message retries if it cannot reach an instance of a service.

Full disclosure: If you end up using recurring jobs, then you are going to notice a lot of "JobLoadException" errors being raised by servers trying to load jobs that belong to other services. This is do to somewhat of a bug in Hangfire where the queues are honored for job execution, but not for job scheduling. The result of this is that servers will pull jobs from the Hangfire database and try to resolve the type before it schedules the job, which results in a failure that the server recovers from and continues scheduling other jobs where it does have a reference to the type.

This bug in no-way prevents your jobs from being scheduled and running on-time. It just creates a lot of unneccesary noise that is going to drive your support personnel crazy if they have a log table that they are responsible for monitoring.

Now for the self-serving poriton of my assistance today: pull request #755 https://github.com/HangfireIO/Hangfire/pull/755 is one that I wrote this past week to address this particular problem as well as a few others that I noticed along the way. If it is accepted, then it will: completely rectify these bad error notifications, give you the ability to data-drive the queues that jobs are sent too, as well as address performance problems related to how the schedulers implement their distributed locking scheme. Just something to keep in mind moving forward.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/HangfireIO/Hangfire/issues/753#issuecomment-265207756, or mute the thread https://github.com/notifications/unsubscribe-auth/ADdlVlDyalA3EFIi5LqxECRxX5n25x-4ks5rFZWggaJpZM4LB733 .

EProd-Rhansen commented 7 years ago

The easiest solution would be to use a centralized locking mechanism. One way would be to use a static object to lock from within Job A. Whenever the first instance begins execution, the first thing it does is lock that object. When the second instance kicks off, it too would try to obtain a lock on that same object and wait. The first instance of Job A would then release the lock as the last step in the workflow and allow the second instance to fire immediately.

The only reason this would not work is if the second instance of Job A is fired within a different instance of the service. At that point you would want to wait on an operating level semaphore (if all instances run on the same machine) or use the locking mechanisms that ship with your database (Oracle and MS SQL Server both have implementations you can research depending on your needs) if those instances are running from different machines in your infrastructure. Keep an eye on #746 though, as that feature is exactly what you appear to be looking for.

Now; if you have a need to only fire Job B immediately after Job A executes, then you'll want to take a look at the ContinueWith feature as that will solve that problem for you.

KeithBarrows commented 7 years ago

Thanks. Did not mean to take this thread off track.

Actually, in the service before enqueueing, I could check for the job's status in the DB. If the same job is already running then just set it as a continuation. Excellent.

-kb

[image: --] Keith Barrows [image: https://]about.me/kbarrows https://about.me/kbarrows?promo=email_sig The truly free individual is free only to the extent of his own self-mastery. While those who will not govern themselves are condemned to find masters to govern over them. -- Socrates

On Tue, Dec 6, 2016 at 11:48 AM, EProd-Rhansen notifications@github.com wrote:

The easiest solution would be to use a centralized locking mechanism. One way would be to use a static object to lock from within Job A. Whenever the first instance begins execution, the first thing it does is lock that object. When the second instance kicks off, it too would try to obtain a lock on that same object and wait. The first instance of Job A would then release the lock as the last step in the workflow and allow the second instance to fire immediately.

The only reason this would not work is if the second instance of Job A is fired within a different instance of the service. At that point you would want to wait on an operating level semaphore (if all instances run on the same machine) or use the locking mechanisms that ship with your database (Oracle and MS SQL Server both have implementations you can research depending on your needs) if those instances are running from different machines in your infrastructure.

Now; if you have a need to only fire Job B immediately after Job A executes, then you'll want to take a look at the ContinueWith feature as that will solve that problem for you.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/HangfireIO/Hangfire/issues/753#issuecomment-265236417, or mute the thread https://github.com/notifications/unsubscribe-auth/ADdlVpu6fwNA350bRHpdY4o-p15OcdGeks5rFa4KgaJpZM4LB733 .

EProd-Rhansen commented 7 years ago

Not a problem, I'm glad that I could help. Just make sure you read uncommited when you query the job tables as they can be locked during the execution of Hangfire.

terryaney commented 7 years ago

@KeithBarrows Did you every end up doing the separate AppDomain processing? I have a similar requirement as yours. We are hoping to replace an older system with Hangfire. The old system supported all my pain points listed below via AppDomains for each 'job'. Unfortunately, I inherited the older systems code, and am still trying to wrap my head around it.

a) Our driving website is a 'generic site' that dynamically supports multi-tenancy. So the 'generic' code that is Enqueuing a job doesn't even have a direct reference to the 'tenant' assemblies. They are passed assembly/type names. So it would enqueue a generic 'Invoker' type job that would then spin up the actual tenant job.

b) Similar to you, we want easy deployment of 'tenant/job assemblies' instead of entire Windows Service (actually, tenant assemblies are dynamically cached into database by version so that running job would load this assembly and declare types, so deployment is sort of automatic in regards to the Windows Service/hangfire server)

c) Given that we have 100-200 tenants, we want these tenant assemblies loaded when the job runs, then unloaded when finished so our Windows Service hosting Hangfire server doesn't end up with 100s of tenant assemblies remaining loaded into memory. As far as I know, only AppDomains support that.

KeithBarrows commented 7 years ago

The short answer, no. Unfortunately, my contract was not extended due to budget cuts. I would still like to figure out an answer to this though!

-kb

[image: --] Keith Barrows [image: https://]about.me/kbarrows https://about.me/kbarrows?promo=email_sig The truly free individual is free only to the extent of his own self-mastery. While those who will not govern themselves are condemned to find masters to govern over them. -- Socrates

On Sat, Feb 11, 2017 at 12:50 PM, Terry Aney notifications@github.com wrote:

@KeithBarrows https://github.com/KeithBarrows Did you every end up doing the separate AppDomain processing? I have a similar requirement as yours. We are hoping to replace an older system with Hangfire. The old system supported all my pain points listed below via AppDomains for each 'job'. Unfortunately, I inherited the older systems code, and am still trying to wrap my head around it.

a) Our driving website is a 'generic site' that dynamically supports multi-tenancy. So the 'generic' code that is Enqueuing a job doesn't even have a direct reference to the 'tenant' assemblies. They are passed assembly/type names. So it would enqueue a generic 'Invoker' type job that would then spin up the actual tenant job.

b) Similar to you, we want easy deployment of 'tenant/job assemblies' instead of entire Windows Service (actually, tenant assemblies are dynamically cached into database by version so that running job would load this assembly and declare types, so deployment is sort of automatic in regards to the Windows Service/hangfire server)

c) Given that we have 100-200 tenants, we want these tenant assemblies loaded when the job runs, then unloaded when finished so our Windows Service hosting Hangfire server doesn't end up with 100s of tenant assemblies remaining loaded into memory. As far as I know, only AppDomains support that.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/HangfireIO/Hangfire/issues/753#issuecomment-279171033, or mute the thread https://github.com/notifications/unsubscribe-auth/ADdlViNrC6B2W7vLVr7IBbFvfVcDJW6lks5rbhDrgaJpZM4LB733 .

terryaney commented 7 years ago

Well, I'm still reviewing my code and I am open to any suggestions - I'm by no means a TPL or AppDomain expert - but I have a POC working. As a reminder, we have a generic 'host site' that does not directly reference tenant/client site assemblies. The flow for the host site to launch a job would be something like this:

  1. Client assembly is 'executing' (within the context of the host site) and creates a job 'input package' (XElement payload) and returns it to host site. Since the client assembly has references to its own job types, it can correctly reference the desired job types and provide the required assembly/type information in the input package.

  2. Host site takes the inputPackage and schedules a generic JobInvoker type job passing in the inputPackage, PerformContext (for use with Hangfire.Console), and JobCancellationToken.Null (to enable the IJobCancellationToken features and allow job cancellation).

  3. From there, Hangfire server picks up job and processes as normal.

The code to enqueue a job would look something like the following:

Tenant site creates input package

// The inputPackage would be created by a tenant and passed back up to our generic 'host site'
var invokeType = typeof(BTR.Evolution.Hangfire.Jobs.ConEd.LongRunningJob);
var inputPackage =
    new XElement("InputPackage",
        new XElement("Invoker",
            new XAttribute("Assembly", invokeType.Assembly.FullName.Split(',')[0]),
            new XAttribute("Type", invokeType.FullName)
        )
    );

Host site takes input package and Enqueue's a JobInvoker job

// Generic Host Site has reference to generic JobInvoker, and Enqueues the job
BackgroundJob.Enqueue(
    () => new BTR.Evolution.Hangfire.JobInvoker().Invoke(
            inputPackage,
            null, // PerformContext
            JobCancellationToken.Null )
);

JobInvoker Implementation

Another one of the requirements we have is that we don't want to sprinkle MarshalByRefObject on every single tenant job class we might have. My implementation only requires that JobInvoker derives from MarshalByRefObject. We do require jobs to implement a IHangfireJob interface. This interface has a single method.

public interface IHangfireJob
{
    Task Execute( XElement inputPackage, IHangfireJobContext jobContext );
}

If a job supports cancellation (when Hangfire attempts to cancel a job by manual delete or server shutdown), we have the IHangfireCancellableJob interface and tenant job classes can opt into if needed/supported.

public interface IHangfireCancellableJob
{
    void Cancel();
}

You'll notice the IHangfireJob.Execute method takes a IHangfireJobContext object. This interface will grow (mostly to support what we need from Hangfire.Console), but the main reason I introduced it was that I needed a way to call methods on the original job's Hangfire.PerformContext. The interface is required because Hangfire.PerformContext is not serializeable and cannot be passed across AppDomains.

public interface IHangfireJobContext
{
    void WriteLine( string message ); // Allows jobs to call through to Hangfire.Console's PerformContext.WriteLine() extension.
}

My workflow in JobInvoker requires that JobInvoker is both the driver and the child (object invoked in separate AppDomain). This is similar to how I interpretted your psuedo code. I'm designating each step with either 'Hangfire Job' (original object created by Hangfire) and 'AppDomain Object' (object that is running on new AppDomain).

  1. Hangfire Job: Holds a reference to PerformContext so that when the original JobInvoker class (created by Hangfire) can be passed across AppDomain to the child AppDomain object as a IHangfireContext object allowing pass through methods to be called to act on PerformContext.

  2. Hangfire Job: Create AppDomain and create a new JobInvoker via instance = appDomain.CreateInstanceAndUnwrap( assembly.FullName, type.FullName ) as JobInvoker; This new JobInvoker is the 'AppDomain Object'.

  3. Hangfire Job: Since JobInvoker derives from MarshalByRefObject I can pass it through as a parameter when I call a method on instance object.

  4. Hangfire Job: Register the instance.Cancel() method to be called when Hangfire's IJobCancellationToken is cancelled (since I can't pass the token across AppDomain). This was created by following concepts from http://stackoverflow.com/questions/15149211/how-do-i-pass-cancellationtoken-across-appdomain-boundary.

  5. AppDomain Object: Now that I'm running in a new AppDomain, create a new tenant job object via reflection based on information in the inputPackage element.

  6. AppDomain Object: I wanted to support await/async in all my jobs, I'm asking if what I've done is best practice at http://codereview.stackexchange.com/questions/155115/starting-async-await-from-any-arbitrary-synchronous-method, but essentially I just start the job via Task.Run so that I can have my actual tenant job support async/await.

  7. Tenant Job Object: Process the job (using IHangfireJobContext and IHangfireCancellableJob as desired). Will show that code later.

  8. Hangfire Job: Unload the AppDomain.

    
    public class JobInvoker : MarshalByRefObject, IHangfireJobContext
    {
    JobInvoker instance;
    PerformContext performContext;
    
    void IHangfireJobContext.WriteLine( string message )
    {
        if ( performContext == null ) return;
        performContext.WriteLine( message );
    }
    
    public override object InitializeLifetimeService() { return null; }
    
    public void Invoke( XElement inputPackage, PerformContext performContext, IJobCancellationToken cancellationToken )
    {
        this.performContext = performContext;
    
        var path = new FileInfo( Assembly.GetExecutingAssembly().Location ).DirectoryName;
    
        var info = new AppDomainSetup()
        {
            PrivateBinPath = path,
            ApplicationBase = path,
            ApplicationName = "Job Name",
            ShadowCopyFiles = "true",
            ShadowCopyDirectories = path
        };
    
        var appDomain = AppDomain.CreateDomain( info.ApplicationName, AppDomain.CurrentDomain.Evidence, info );
    
        try
        {
            appDomain.InitializeLifetimeService();
    
            // Will come up with a better way of pasing an assembly (caching or deploy via database), but for now
            // just look in the /bin/ClientAssemblies folder, so pass the 'main' AppDomain's path in because the
            // instance AppDomain's assemblies will be in temp directory.
            var assembly = (string)inputPackage.Element( "Invoker" ).Attribute( "Assembly" );
            var jobAssembly = Path.Combine( path, "ClientAssemblies", assembly + ".dll" );
            inputPackage.Element( "Invoker" ).Add( new XAttribute( "ClientAssembly", jobAssembly ) );
    
            // Create another JobInvoker object (since that is only thing with MarshalByRef) that'll create
            // the job designated in the inputPackage's <Invoke/> element.
            var jobInvokerType = GetType();
            instance = appDomain.CreateInstanceAndUnwrap( jobInvokerType.Assembly.FullName, jobInvokerType.FullName ) as JobInvoker;
    
            // After instance is created, register the JobInvoker.Cancel() method with the cancellationToken so that it
            // is called whenever the Hangfire cancellation token is cancelled. (The using causes the callback to be
            // unregistered on Dispose()).
            using ( var cancellationTokenRegistration = cancellationToken.ShutdownToken.Register( () => instance.Cancel() ) )
            {
                instance.Process( this, inputPackage.ToString() );
            }
        }
        finally
        {
            AppDomain.Unload( appDomain );
        }
    }
    
    IHangfireJob hangfireJob;
    
    public void Process( IHangfireJobContext jobContext, string inputPackageXml )
    {
        var inputPackage = XElement.Parse( inputPackageXml );
    
        // Normally this would come from a database/cache, if I load assembly by assembly name, the tenant assembly remains loaded in Windows Service AppDomain.
        var jobAssembly = (string)inputPackage.Element( "Invoker" ).Attribute( "ClientAssembly" );
        byte[] buffer = File.ReadAllBytes( jobAssembly );
        var jobType = (string)inputPackage.Element( "Invoker" )?.Attribute( "Type" );
    
        hangfireJob = buffer == null
            ? CreateJob( (string)inputPackage.Element( "Invoker" ).Attribute( "Assembly" ), jobType )
            : CreateJob( Assembly.Load( buffer ), jobType );
    
        // Start the job asynchronously so I can use async/await
        Task.Run( async () =>
        {
            await hangfireJob.Execute( inputPackage, jobContext );
        } ).GetAwaiter().GetResult();
    }
    
    public void Cancel()
    {
        // If the job that is running supports cancellation, tell them Hangfire is trying to cancel
        var hangfireCancellableJob = hangfireJob as IHangfireCancellableJob;
        if ( hangfireCancellableJob != null )
        {
            hangfireCancellableJob.Cancel();
        }
    }
    
    public IHangfireJob CreateJob( string assemblyName, string typeName, params object[] constructorArgs )
    {
        var assembly = Assembly.Load( assemblyName /*, AppDomain.CurrentDomain.Evidence */ );
        return CreateJob( assembly, typeName, constructorArgs );
    }
    
    public IHangfireJob CreateJob( Assembly assembly, string typeName, params object[] constructorArgs )
    {
        var type = assembly.GetType( typeName );
        var constructor = type.GetConstructor( constructorArgs.Select( a => a.GetType() ).ToArray() );
    
        if ( constructor == null )
        {
            constructor = type.GetConstructor( Type.EmptyTypes );
            constructorArgs = Type.EmptyTypes;
        }
    
        var instance = constructor.Invoke( constructorArgs );
    
        return instance as IHangfireJob;
    }

}

**Child Job Implementation**

So the child job can come from any assembly and the Windows Service I've implemented hosting Hangfire does **not** need to directly reference this.  Right now, I'm just manually dropping the tenant assemblies into a sub /ClientAssemblies directory so they are available when I do `Assembly.Load( File.ReadAllBytes( ... ) )`.  WWhen I finalize everything and the assembly is coming from a database and loaded via `Assembly.Load( byte[] )`, the assembly *does not* remain loaded in the Windows Service's AppDomain.  However, even with my current prototype of just placing assemblies in /ClientAssemblies, it seems to behave this way as well.  So I should be able to have no problem deploying client dll's - isolated from the actual Windows Service implementation.  In addition, I think the interfaces above are simple enough that they should never change.  I can pretty much pass anything I need in an XElement payload (or simply put more details in a database for the job to pull out if becomes too complex for XElement).

If the job is going to support cancellation, it needs to handle the `Cancel()` method call in the `IHangfireCancellableJob` interface.  If you need to write to the Hangfire.Console, you can call `jobContext.WriteLine()`.

public class LongRunningJob : IHangfireJob, IHangfireCancellableJob { CancellationTokenSource cancellationToken;

public void Cancel()
{
    cancellationToken.Cancel();
}

public async Task Execute( XElement inputPackage, IHangfireJobContext jobContext )
{
    cancellationToken = new CancellationTokenSource();

    for ( var i = 0; i < 10; i++ )
    {
        if ( cancellationToken.Token.IsCancellationRequested )
        {
            throw new EvolutionJobForciblyCancelledException();
        }

        await Task.Delay( TimeSpan.FromSeconds( 5 ) ).ConfigureAwait( false );

        jobContext.WriteLine( $"Processing batch {i}" );
    }
}

}


Hope this helps someone.  If I improve on this or find any problems, I'll update it here.  Any suggestions are most certainly welcome.