OrchardCMS / OrchardCore

Orchard Core is an open-source modular and multi-tenant application framework built with ASP.NET Core, and a content management system (CMS) built on top of that framework.
https://orchardcore.net
BSD 3-Clause "New" or "Revised" License
7.45k stars 2.41k forks source link

Workflow instance did not serialise to Database, stuck in loop emailing notifications #5993

Open Andy-McAuley opened 4 years ago

Andy-McAuley commented 4 years ago

We support an orchard core site (..netcore 3.1, sqlserver database) . this has a "Place Order" workflow that notifies the user by email when they place an order, and then also sends a notificaton to a system malbox.

The client who owns the website asked us to stop sending the notification email to the the system mailbox, so we edited the workflow to remove that step and saved the changes.

A couple of hours later, the client got in touch to say that one of their customers was being mail bombed with the same notification email over and over again.

We could see thousands of emails being sent to the customer in the email server logs but there was no "Place Order" workflow instance serialised to the orchard database that related to the placing of the order for the customer who was being mail-bombed.

We stopped the orchardcore service - this completed the workflow stopped the emails being sent, and the related workflow was then written to the database with the Status and FaultMessage:

**"Status":6,"FaultMessage":"The operation was canceled."**

We have never had a problem editing a workflow on a live site before, but looking at the time of the workflow instance that got stuck in a loop and the time the change was made to the workflow it looks like they were very close together (definitely in the same minute).

Could editing the workflow definition while the workflow was in use cause this problem, and if so why?

We have now added debug logging to the "Place Order" workflow to see if we can get some insight into why it failed.

sebastienros commented 4 years ago

Is it possible that the previous state of a workflow instance, resumed with the new workflow definition, would have caused this loop?

sebastienros commented 4 years ago

I would also suggest we add some short circuit logic to block this activity from working if we reach some thresholds. A configurable rate (like 10 by minute, even if only for the same body to the same recipient). Could easily be done with a hash containing recipient and body on a sliding cache entry. Then log an error if the cache entry already exists instead of sending the email.

Could be done in the email service directly. If an email should be sent frequently, then the body should contain a variable that would make it unique.