SFDO-Community / declarative-lookup-rollup-summaries

Declarative Lookup Rollup Summaries (DLRS) is a community built and maintained Salesforce application that allows you to create cross object roll-ups declaratively - no code! For install instructions and documentation, visit our website https://sfdo-community-sprints.github.io/DLRS-Documentation/
https://sfdo-community-sprints.github.io/DLRS-Documentation/
BSD 3-Clause "New" or "Revised" License
693 stars 237 forks source link

Rollup Schedule Job Error: Already Executing #1314

Closed ChetanKammar closed 5 months ago

ChetanKammar commented 1 year ago

Describe the bug Hi I am receiving the below error for almost all my roll up jobs, currently I'm on 2.17 Latest version, and as it suggested I deleted all the old scheduled jobs and resheduled them all between 2 to 5 AM everyday. Error: A calculate job for rollup '[Rollup Name]' is already executing. If you suspect it is not already running try clearing the applicable record from the Lookup Rollup Calculate Jobs tab and try again. Review the error, rollup definition and/or delete the Apex Scheduled job under Setup. Check if the rollup still exists via the Manage Rollup Summaries and/or Lookup Rollup Summaries tabs. A clear and concise description of what the bug is.

To Reproduce Steps to reproduce the behavior:

  1. Go to Lookup Rollup Summaries
  2. Create Lookup Rollup Summary in Scheduled calculate mode
  3. Schedule Calculate for early morning (between 3-5am)

Expected behavior The jobs should run without producing an error that they're already running and update the fields.

Screenshots If applicable, add screenshots to help explain your problem.

Additional context I have already deleted the scheduled jobs and rescheduled but no use. @davidmreed please take a look and help me.

aheber commented 1 year ago

@ChetanKammar sorry you're having these problems. I understand that can be frustrating. In the future please refrain from mentioning project members directly. This project is community supported and people will jump in and help when they are able.

For the best support we ask that people use the Trailblazer Community Group for DLRS to request support. All of the most experienced people engage there.

As for your specific issue, I'll do my best to troubleshoot.

Did something change in your environment or configuration? Were these all running as scheduled full calculate?

Have you reviewed the Apex Jobs portion of Setup to see if any errors/failures were thrown from the calculation jobs?

When Lookup Rollup Calculate Jobs are left over, it is usually an indication of a severe failure of the Rollup calculation jobs so they were unable to report failed actions as well as failed to clear the Rollup Job records. So we're expecting to see a failed Apex Transaction somewhere, more likely in your scheduled work. Apex Jobs page should hold those details.

Once reviewed please let me know if you see anything? Most often we see things like too many records queried or configuration loops or long chains that exhaust platform transaction limits and cause the transaction to be killed. This can also be caused by configurations that referenced invalid fields or objects if you've recently made changes or deleted a critical element. If it is happening on all your jobs then I'd find that less likely.

Hope that puts you on the right path.

ChetanKammar commented 1 year ago

Hi Aheber,

Thanks, will not mention members in the posts directly hereafter.

about the troubleshoot points

From the time of implementation of DLRS app to this date - Yes there was a change in data sync between salesforce and ERP software(Pronto), earlier when the DLRS was setup (December 2016) the data sync was happening from Jitterbit sync, now the data is directly coming from ERP(Pronto) after the direct integration of both softwares (August 2022). Other than this there are no changes to the environment and I am not sure whether this integration process affecting the rollup jobs. Yes the error is associated for "schedule full calculate jobs".

About Apex jobs, most of the scheduled jobs are in Queued status and some are in Aborted status and I have not got any errors from these jobs.

I have reviewed the referenced fields and did not see any problem there. I am thinking to uninstall the DLRS and recreate all the rollup jobs as deleting the apex jobs and reschedule did not work. Do you think the the new integration between Salesforce and ERP could be causing this issue? please suggest.

Thanks Chetan

aheber commented 1 year ago

Thanks for getting back to me.

Do you have any data on the historical runs? I don't think uninstall and full reinstall would make a difference because it all comes down to the configurations.

I think we're looking for rollups that are not Queued or Aborted (unless the Aborted includes any error details but that should have a different status if it tried to run) but Completed or Failed. Specifically we're expecting Failed status runs with error conditions.

It might be that the integration brought in lots of data that are driving the DLRS jobs to fail now.

The other place you can look will be /lightning/o/dlrs__LookupRollupSummaryLog__c/list which contains the logs that DLRS produces. There is a chance the logs were not generated due to the Apex transaction failing, worth taking a look though.

ChetanKammar commented 1 year ago

Hi Aheber,

Sorry for late response, Thanks for above suggestions, the error was causing by a lookup filter we had on one of the parent object on which we had our DLRS jobs scheduled, and errors were also caused by the old jobs which were present in Lookup rollup calculate jobs. I have fixed both now the scheduled jobs are working fine.

But today morning I received error mails probably from salesforce saying

[Apex job 7079600000PSvZxAAL failed to update rollups in LIO (XXX)

Error: Record Currently Unavailable: The record you are attempting to edit, or one of its related records, is currently being modified by another user. Please try again.. Parent record Ids ]

Could you please guide me how I can resolve this? I did check all the Validations we have in our account object and all are very much required, I'm not sure what is causing issue for DLRS to fail the update for few parent accounts...

aheber commented 1 year ago

@ChetanKammar great to hear you were able to get your other stuff resolved.

As for the error, this is Record Lock Contention in the system. This is usually caused by something else in the system holding onto the record. A lot of times this bulk data loading or possibly even multiple rollups jobs trying to work on the same parent records.

Unfortunately the cause is likely to be tightly coupled to your system. DLRS tries to lock records during scheduled full calculate runs, otherwise it just attempts the rollup and if the record is locked by something else then it might error out.

ChetanKammar commented 1 year ago

Hi Aheber,

Thanks for your input, will review my system and try to fix this and update you if I find any solution for this issue.

aheber commented 5 months ago

Considering this resolved. Hopefully you've been able to take care of everything. If you do need additional help I hope to see you in the Trailblazer Community Group.