OrchardCMS / Orchard

Orchard is a free, open source, community-focused Content Management System built on the ASP.NET MVC platform.
https://orchardproject.net
BSD 3-Clause "New" or "Revised" License
2.37k stars 1.12k forks source link

Execute ProcessingEngine tasks for background tasks too #4666

Open orchardbot opened 10 years ago

orchardbot commented 10 years ago

@Piedone created: https://orchard.codeplex.com/workitem/20837

Because otherwise you can't e.g. run recipes from bg tasks.

orchardbot commented 9 years ago

@sebastienros commented:

As long as we process the ones for the background thread.

Piedone commented 8 years ago

Related is the issue that PE tasks won't be executed when you register then from an IOrchardShellEvents.Activating() implementation (like a migration step) due the tasks getting vanished: HttpContext.Current is null (not sure why) thus they are stored in CallContext, which gets wiped too. I think it's because the parallel start of all the shells causes thread switches.

jtkech commented 8 years ago

@Piedone this interests me because it's related to the CallContext and I can share some test results, even they have been done in other contexts (I think you're talking about what is done in CreateAndActivateShells()).

I've seen that HttpContext.Current is carried among thread switching, but you lose it if you start explicitly yourself a new thread (e.g by starting more than one parallel Tasks). Anyway, here, before the Parallel.ForEach(), I've seen that HttpContext.Current is already null, because it's in the host Initialize().

About the CallContext Get/SetData, seems ok to be used in each Task as long as there is no thread switching in a particular Task. The only way I've found to repro your issue is by using in a Task some async / await code. Then, as you have taught me, the continuation code can switch to another thread...

So, I am interested to know whether this applies to you. Maybe your Activating() handler hit some async code through some libraries, or e.g if you use some azure media services (I've seen some async code here)...

If so, the CallContext Logical Data might be a good candidate to be used in the ContextState class. I can work on it if you think it's worth...

Note: In my experimental PR #5966 the CallContext Logical Data seems to work fine for me. Someone else have used it to fix another issue, see #5983, but we need more time to be sure...

Best

Piedone commented 8 years ago

Thanks for your input @jtkech . Yes you're most possibly right about why the HttpContext is null in this case.

I'm not sure why CallContext looses its state; I guessed it was because the Parallel.ForEach() when activating shells but it indeed might be that there is some async code too (however in the affected code base I don't know about any that is executed during shell start).

And yes, maybe LogicalData can help here too, if you have time please try!

jtkech commented 8 years ago

@Piedone,

1st approach is to process the PE tasks in the parent thread after the ForEach(), but we need to wait for the tasks completion. We also need to first write the LogicalData slot in the parent thread, otherwise the child tasks updates will not be seen by the parent. For the same reason we need a referenced object (not a value type or immutable object) but, because of the parallel tasks, it has to be thread safe.

2nd approach is to only write the LogicalData slot in the child tasks, as it is in our case. This way, each parallel task will have its own object (not the same reference) carried among async threads switching... Then, we can process the PE tasks at the end of each child task. Maybe a better solution.

Note: Similarly with web requests, maybe the HttpContext.Current static property is thread safe but, I think, not its properties as the Items collection. It is carried among async threads switching, but it is not intended to be used concurrently... We can serve multiple web requests because each one has its own httpcontext.

So, here the idea with the 2nd approach is to have an intermediate between the CallContext Data and LogicalData, we want to carry data among threads switching due to async / await code, but don't share the same objects in concurrent tasks.

So, with the 2nd approach, the parent thread must not write the data slot, otherwise child tasks will share the same referenced object. For a more general solution, each new starting parallel task would need to write a new object (can be null) in the data slot. OR, as tried, use a different prefixed data slot name...

NOT SURE, BUT FOR A FIRST TEST, because currently the PE tasks list is only updated in child tasks (not in the parent), try to only replace in ContextState.cs all GetData() / SetData() with LogicalGetData() / LogicalSetData(), and see what happens. Let me know.

Something which I did not expect: If in a tenant ActivateShell() data are stored in the Logical CallContext, they are still in the CallContext when running the related tenant backgound tasks! I think because the background timer is started by the SweepGenerator that is activated after the ActivateShell()...

Best

Piedone commented 8 years ago

@jtkech I'm not sure I'm totally following :-).

When you mention ForEach() you mean the shell activation foreach?

BW I tried simply changing ContextState to use Logical*Data methods and still the same issue is there. If these methods would work I think they should here: in Global.asax you can see the following in HostInitialization:

            host.Initialize(); // Shell starts run here.

            // initialize shells to speed up the first dynamic query
            host.BeginRequest();
            host.EndRequest(); // PE tasks try to be executed here, but there won't be any.

Don't know why this happens.

jtkech commented 8 years ago

@Piedone

When you mention ForEach() you mean the shell activation foreach?

Yes, and the PE tasks are now processed here in each individual parallel Task...

But it depends on your version, see this commit https://github.com/OrchardCMS/Orchard/commit/8a8b0f91c63c07f40f2bc364e0c0a9bf64abf554

If you don't have this commit, I understand, maybe you don't have any async code in your activating handler, but, because your PE tasks are added in child Tasks, in the first EndRequest() at the end of the HostInitialization() that belongs to the parent thread, you have lost the CallContext Data.

Here, if you use the CallContext Logical Data it's the same because the PE tasks list is only written in the child Tasks of the parallel ForEach loop. Logical Data are carried among threads but only from the first writing where the data slot is created. So, here, each parallel child Task will have its own object, and the parent thread will see nothing...

Best

jtkech commented 8 years ago

@Piedone

Did you have applied 8a8b0f9 where PE tasks are processed in the shell activation foreach loop?

Here a graph to describe CallContext LogicalData behavior for regular referenced objects. For value types or immutable objects, replace below Write with Write a new object.

// parent thread
Write data slot 1
Write data slot 2
Data slot 3 not yet created

    Parallel.ForEach(... => {
        // child tasks
        Retrieve data slot 1 // shared by all childs (must be thread safe)
        Write data slot 1

        Write a new object (can be null) in data slot 2 // each child will have its own object
        Write data slot 2

        Write data slot 3 // each child will have its own object

        await DoAsync()

        Retrieve all data slots as updated above
    });
// here, for testing, wait for tasks completion

Retrieve data slot 1 as updated by childs (except if value types or immutable objects)
Retrieve data slot 2 without child updates
Nothing in data slot 3

Here a graph to describe what could happen in your case if you have not applied 8a8b0f9.

// in HostInitialization()
...
host.Initialize();
    > CreateAndActivateShells()
        ...
        // data slot not yet created in parent thread
        Parallel.ForEach(... => {
            ...
            ActivateShell(context);
                > Activate() // data slot are written only in child Tasks
                    > _sweepGenerator.Activate() // background tasks can retrieve LogicalData!

            // here, you always retrieve the current Task data if you use LogicalData
            // regular CallContext Data can be lost after some async / await code

            // I ASSUMED THAT YOU WAS PROCESSING PE TASKS HERE
            while (_processingEngine.AreTasksPending()) {
                ...
                _processingEngine.ExecuteNextTask();
            }
        });
...
// end of HostInitialization()
host.BeginRequest(); // note: here HttpContext.Current is null
host.EndRequest(); // here, nothing in the data slot even if you use LogicalData

Best

Piedone commented 8 years ago

No, I haven't tried https://github.com/OrchardCMS/Orchard/commit/8a8b0f91c63c07f40f2bc364e0c0a9bf64abf554, thank you for pointing that out @jtkech ! Actually that might solve this issue alone.

I still don't know why CallContext looses its state between the calls of HostInitialization() but probably that with this isn't interesting any more. I'll check this out.

Piedone commented 8 years ago

Yes, with that patch applied and using LocicalData the PE tasks are now executed. Also tried without LogicalData: that works as well (I'd guess because PE tasks are executed for each shell separately, in each task/thread).

jtkech commented 8 years ago

@Piedone

It's because the CallContext data slot is first created in your activating handler that runs in the ForEach loop, in a child Task. Then, when you return back to the parent thread the data slot is lost. This happens with the regular CallContext Data that belong to only one thread.

Here, this also happens with LogicalData when you return back to the parent thread because the data slot has been first created in a child thread. So , the data slot doesn't exist in the parent thread!

To describe this, see my first graph above.

Anyway, good to know it can solve your issue, best

jtkech commented 8 years ago

@Piedone, just seen your last comment

For sure, when executing PE tasks in the ForEach loop it works with Data OR LogicalData. But, as said before, if you have in a individual parallel Task a thread switch due to some async / await code, then it could fail with regular Data, but not with LogicalData.

So, maybe it's still worth to implement something to fit this scenario?

Best