akshaysura / Sitecron

SiteCron is a scheduling module based on Quartz Job Scheduler.
MIT License
22 stars 23 forks source link

Latest SiteCron does not fire IJob's on XP0 on-Premise installs of Sitecore 9.3 - 10.2 #31

Open asontu opened 2 years ago

asontu commented 2 years ago

The latest SiteCron packaged never actually fires the scheduled jobs on the environments that I work with. All of these are XP0 or XP1 installs on-Premise. There is nothing in the logs that could hint towards anything amiss, the logs report the jobs being scheduled as expected.

To rule out any issues with our implementation of IJob jobs, I've reproduced this behavior on Clean Sitecore 9.3, 10.0 and 10.2 installs that I have access to at my work (don't have a clean 10.1 install at the moment) and tried to run the Cleanup job that comes with SiteCron.

So to reproduce:

  1. Install Sitecore 9.3+ XP0 on-Premise
  2. Install PowerShell Extensions 6.3 (latest)
  3. Install latest SiteCron from here
  4. Enable the job "Cleanup - Delete SiteCron Execution Reports Older than 200 days - Execute PowerShell Script"
  5. Either run through "Execute Now!", setting a Cron-expression or setting an Exact Date Time.
  6. Observe an item being added to the Auto folder that never deletes itself.
    • The logs do contain:
      "INFO SiteCron - Job Loaded - Job Source: DATABASE - Execute Now Cleanup Delete SiteCron Execution Reports Older than 200 days Execute PowerShell Scrip - Type: Sitecron.Jobs.PowerShell.ExecuteScript, Sitecron USING Cron Expression: 27 30 15 1/1 ? Parameters: - Job ItemId:{38D1BC06-C9C7-43C0-B139-BFA93711B0D0}"
    • The logs don't contain the expected entry from the ExecuteScript log-line:
      Log.Info(string.Format("SiteCron: Powershell.ExecuteScript Instance {0} of ExecuteScript Job [...]"));.

In the interest of submitting a more complete bug-report I've cloned the most recent master branch, build and deployed the dll's and attached Visual Studio to the IIS process to try and see what goes wrong. I've repeated this for the Clean Sitecore 9.3, 10.0 and 10.2 installs mentioned.

I put a breakpoint in the OnItemSaved method, which does get hit and completes without error, adding the job to _inProcess and calling _scheduleManager.ScheduleAllJobs(); without issues.

I also put a breakpoint in the Execute method of the ExecuteScript class, this breakpoint never gets hit. I've tried with the "Execute Now!" context-menu option, with a Cron expression, and with an Exact Date Time value.

I'm aware that my stepping through the OnItemSaved method could take enough time to cause the actual adding of the job to happen after the minute that the Cron expression or Exact Date Time contains and I've accounted for that too in subsequent runs.

I've also been made aware of a known issue with SiteCron 3.4 and that I should try SiteCron 3.6. I can't find any branch/commit where the AssemblyVersion is anything higher than 3.4.0.0 though. I'm assuming this is a small oversight, again I can reproduce this issue with the latest master.

asontu commented 2 years ago

(as an aside, I put a bounty on the relevant Sitecore StackExchange question for whoever tackles this)

markgibbons25 commented 2 years ago

The change I thought that fixed this issue is here #26, I built this module from source and am using it in 10.1.

asontu commented 2 years ago

Are there particular build-steps I should keep in mind or should "Build Solution" in Visual Studio suffice? I'll try that commit to rule out that this isn't a regression since then.

markgibbons25 commented 2 years ago

I think build solution suffices :)

asontu commented 2 years ago

I've managed to frankenstein a version that works, by branching off of Joao's work right before the upgrade to Quartz.NET 3.x, where Quartz.NET becomes async etc.

I had a suspicion that this might be the issue. I already noticed that the Quartz.NET docs mention a different way of getting and disposing of the scheduler than SiteCron seems to use.

We might opt to keep using this older Quartz.NET dependency for now, certainly for the clients directly affected right now.

Would still be nice if the latest version "just works", focus should be on the async Quartz.NET 3.x library usage evidently.

akshaysura commented 2 years ago

The change I thought that fixed this issue is here #26, I built this module from source and am using it in 10.1.

@markgibbons25 if you got it working on 10.x, could you create a PR for that version Mark? It would be nice if we could push a version out specifically for 10. I know @fluxdigital has done something similar for 9.x

markgibbons25 commented 2 years ago

I checked the time line, what I did was take this repo at this point https://github.com/akshaysura/Sitecron/commit/a122e8085f19e641571dea479f7afcfca187884c and then upgrade the packages for 9.2 and fix the new (at the time) abstraction for Jobs. (like 2 lines changed for that). When I later on did the 10.1 upgrade I don't think I changed anything else for that.

I think the changes from @netojoa https://github.com/netojoa are good though but perhaps you're right there's a bug there.

asontu commented 2 years ago

@fluxdigital's 9.x branch also still uses Quartz.net 2.x, before the new abstraction. If you Mark (and João?) have the new Quartz.net working, I'm wondering whether there's anything in terms of Sitecore or even IIS config that needs tweaking? Maybe something around the scheduler threads being kept alive even though they're not doing much when they're waiting for the right time to fire a job?

Would be good to add that to release notes in that case.

asontu commented 2 years ago

I've forked the project and downgraded Quartz, I'm hesitant to make this a pull request since I'm simply undoing João's commits. Also it contains other changes that have nothing to do with this Quartz issue.

praveensinghpanwar commented 2 years ago

@asontu - We had the exact same issue explained by you for Sitecore 10.2 onprem. For us upgrading System.Diagnostics.DiagnosticSource.dll from version 4.700.19.46124 to version 4.700.20.21406 fixed the issue. Using Sitecron version 3.4.0.0 and Quartz version 3.2.4.0

brandon-hurler commented 1 year ago

@praveensinghpanwar Thank you for your comment. We had the same issue with Sitecron 3.6.1 and Quartz 3.6.0. We were using System.Diagnostics.DiagnosticSource version 7.0.0 and missed updating the binding redirect.

For anyone else having this issue, make sure you have the correct binding redirect in place for that DLL if it doesn't match what ships with Sitecore.