Ideas for improving the Batch interface

vorner commented 4 years ago

Hello

I've been mulling a bit over the batch execution API. To be honest, I don't find it very comfortable to use. I see several problems:

It's verbose. To use it, you have to implement two traits (System and BatchController), and for the System you have to implement even the methods that you don't really care about most of the time (eg. accessor and setup).
Unsafe is sticking out of it even though there's nothing the user needs to uphold towards the library or anything unsafe the user wants to do.
As the controller is passed as a type parameter, not as a value, there's no (reasonable) way to pass additional info into it.

If my guess is correct, the System is used mostly to make the implementation easier, because the Dispatcher just works with Systems all the time.

I have a proposal how to make the interface better while not introducing much more complexity.

Let's have a trait, something like this (modulo naming, lifetimes...):

trait BatchController {
    type SystemData: SystemData;
    type Tmp;
    fn run_pre(&mut self, data: Self::SystemData) -> Self::Tmp;
    fn run(&mut self, tmp: Self::Tmp, dispatcher: DispatcherProxy);
}

Then, internally, there would be some wrapper that would implement the System, hold the dispatcher, and the instance of this new BatchController.

The relevant part of the example would look something like this:

// No data needed here, the accessor and dispatcher are part of the wrapper below the surface.
struct CustomBatchControllerSystem;

impl BatchController for CustomBatchControllerSystem }
    type SystemData = Read<TomatoStore>;
    type Tmp = TomatoStore;
    fn run_pre(&mut self, data: Self::SystemData) -> Self::Tmp {
        *data
    }
    fn run(&mut self, _ts: TomatoStore, dispatcher: DispatcherProxy) {
      for _i in 0..3 {
         // The BatchUncheckedWorld is part of the proxy
         dispatcher.dispatch();
      }
}

...
DispatcherBuilder::new()
    .with(SayHelloSystem, "say_hello", &[])
    .with_batch(
         CustomBatchControllerSystem, // Passing value, not a type
         DispatcherBuilder::new().with(..)
         "BatchSystemTest",
         &[],
     )
     .build();

If you like the idea, I can try making a PR with the changes.

Alternatively, it would also be possible to create just the wrapper that would implement the current traits. That would preserve the current API, but the cost is that it would still not be possible to pass values in there, only types.

vorner commented 4 years ago

Ping? I can do the work, I only want to know if I should invest the time in the attempt. Is anyone here who can share their opinion?

@azriel91 You seem to be the most active person at the head of the repo. Are you the right person to ask?

azriel91 commented 4 years ago

Heya, sorry for the late reply. I have yet to understand the Batch feature myself, but shall try to do so tonight -- shall reply within 5 hours.

azriel91 commented 4 years ago

Heya, just went through the code, and like the idea you're proposing.

One thing that looks like needing to change is the Tmp type and pre_run in CustomBatchControllerSystem -- if BatchController::run already has Read<TomatoStore>, then it doesn't look possible for DispatcherProxy to Write<TomatoStore> to inner systems, since Read<TomatoStore> reference is always in scope in run.

In my mind it makes sense to simply pass in &World to BatchController::run alongside the Dispatcher, and implementations can decide whether they want to fetch TomatoStore / anything else -- API shouldn't impose "please specify some system data that you may or may not want" when later on the batch controller also executes systems, which needs access to the World (whether or not it's hidden behind the DispatcherProxy).

Perhaps something like this:

// User implements this:
struct CustomBatchController;
impl BatchController for CustomBatchController {
    fn run(&mut self, world: &World, dispatcher: &Dispatcher) {
        {
            let _ts = TomatoStore::fetch(world);
            // do something
        }

        for _i in 0..3 {
            dispatcher.dispatch(world);
        }
    }
}

All of this can be hidden from the user inside shred:

pub struct BatchControllerSystem<'a, 'b, BC> {
    batch_controller: BC,
    accessor: BatchAccessor,
    dispatcher: Dispatcher<'a, 'b>,
}

impl<'a, BC> System<'a> for BatchControllerSystem<'_, '_, BC>
where
    BC: BatchController,
{
    type SystemData = BatchUncheckedWorld<'a>;

    fn run(&mut self, data: Self::SystemData) {
        self.batch_controller.run(data.0, &self.dispatcher);
    }

    fn running_time(&self) -> RunningTime {
        RunningTime::VeryLong
    }

    fn accessor<'c>(&'c self) -> AccessorCow<'a, 'c, Self> {
        AccessorCow::Ref(&self.accessor)
    }

    fn setup(&mut self, world: &mut World) {
        self.dispatcher.setup(world);
    }
}

vorner commented 4 years ago

I'd say the manual fetch/manually interacting with the world still feels a bit awkward to me. Most of the time I don't have to, the idea is that the Systems get their data prepared for them which is very nice and comfortable. So I was thinking ‒ looking at the example, that I could split the original run method into the two parts and the first part would pass any needed information through the Tmp.

I'd also like to point out that the Tmp in my example is not Read<TomatoStore>, but TomatoStore and I'm passing a copy of it ‒ for the same reason the original example has the block that drops it. The idea is you wouldn't be keeping any accessor proxies (is it a reasonable name for Read and Write handles into the world?) inside Tmp, that you'd prepare the plan what to do and pass that one on.

It would still be significantly better than the current unsafe so if you find my interface too limiting, I'm probably OK with what you propose.

I still think it would be possible to somehow combine both, if:

We would still go with this Tmp and pre_run
The pre_run would be provided as an empty default and Tmp would default to ()
In addition to the dispatcher (or the proxy), one would also get the world in addition to Tmp to allow playing with it directly.

That adds some more parameters that might go unused from time to time, though. What do you think about it?

azriel91 commented 4 years ago

Hmm, these two are the main points I'd like to address (and the rest may fall into place):

I'd say the manual fetch/manually interacting with the world still feels a bit awkward to me. Most of the time I don't have to
Tmp in my example is not Read<TomatoStore>, but TomatoStore and I'm passing a copy of it

(In these eyes) the purpose of a batch is to run multiple systems (or not). This may be:

"just run the dispatcher"
run the dispatcher multiple times
decide whether to run it based on some logic (which may or may not access data from the World)

In the simple case, "just run the dispatcher" means BatchController::run can have a default implementation that goes:

impl BatchController for DefaultBatchController {
    fn run(&mut self, world: &World, dispatcher: &Dispatcher) {
        dispatcher.dispatch(world);
    }
}

Next level, if it should run it multiple times, then the user does have to implement that trait function:

impl BatchController for CustomBatchController {
    fn run(&mut self, world: &World, dispatcher: &Dispatcher) {
        for _ in 0..3 {
            dispatcher.dispatch(world);
        }
    }
}

Next level is where it becomes tricky (and where the first point weighs in) to decide whether we should have the Tmp associate type + additional param to run. The use of this may be:

Additional logic that uses Tmp, and the batch controller is behaving as a System.
Additional logic that uses Tmp to decide how to run the dispatcher (if at all).

BatchController doesn't need to be designed for the first case -- users can simply specify it as another system.

For the second case, (am guessing) the pre_run is a way to decide how to run the dispatcher, so perhaps Tmp would be an enum with variants with the calculated result. Maybe TomatoStore in the example happens to be that calculated result stored in the world (but doesn't have to be).

If we do wire through Tmp from pre_run to run, that introduces an additional parameter, which means for that particular usage of a BatchController, the API is cleaner -- we split the decision logic from the dispatcher run logic. However, that imposes an additional unused parameter in run for the first two cases (as well as the 3rd case, if the user doesn't want to wire through a Copy type). It's this additional parameter for the unused cases that I'd like to avoid -- users with custom decision logic can still have the decision logic split in their implementation, it's just not enforced by the API.

Oh, that usage pattern is an assumption I'm making and may be inaccurate, since I'm not (yet) a user of the API (but may soon be -- think batches are a good way of implementing paused games).

vorner commented 4 years ago

Hello

You raise interesting points. Some more thinking from my side.

It has been some time when I first thought of the interface and I think the way you do fetch in your example wouldn't work out of the box. The scheduler needs to be told which resources you'll be interested in so it can schedule other things in parallel with it. So, in addition to the fetch, one would have to also declare it somewhere. The SystemData in my example is not just convenience to avoid fetch, it's also the information for the scheduler. That's why „usual“ System has the associated type too.

Now, for the use cases. You're right that a big use case for the batch interface is pausing the game ‒ at least it's the use case I had and I was thinking that the current interface is incredibly complicated for that.

Anyway, If I do a split of use cases as I see them (and they might somehow map to your splitting).

I can decide up-front how many times I want to run the dispatcher, be it 0, 1 or many times. Then I can run it that many times without further „thinking“ about it. I might need to look at some resource to determine that upfront, but after I've started dispatching, I can just let the world be. The paused game nicely maps here (first decide if the resource is Paused(true) or Paused(false) and then run 0 or 1 times).
I want to do some more complex logic, like keep running in a loop until some condition is satisfied. I need to access data of world in between runs of the dispatcher (while world.fetch::<Enemies>().len() < 1000 { spawnMoreEnemiesDispatcher.dispatch() }).

Looking at it this way, I come to think that I could actually write two different interfaces for that ‒ one where I'd basically have only a function SystemData → usize (pre_run with Tmp fixed to usize), saying how many times to run it and the for cycle would be provided (user writes no run function). The other would be as you propose, without theTmp`, and with manual fetch (still somehow declaring what it would fetch).

I think I'll simply start experimenting with some code and try some examples how the API would look like. There can always be some more fine-tuning on the pull request.

azriel91 commented 4 years ago

The scheduler needs to be told which resources you'll be interested in so it can schedule other things in parallel with it. So, in addition to the fetch, one would have to also declare it somewhere. The SystemData in my example is not just convenience to avoid fetch, it's also the information for the scheduler. That's why „usual“ System has the associated type too.

oh yes, it's a good point to keep it in :v:. There isn't a clean way to include that (which may use a Write resource) and also pass world to the inner dispatcher, but if we declare that to be an API usage contract (the system data used in Tmp cannot have an overlapping Write with the inner systems' system data), then I would favour fetching Tmp within shred and passing it to the batch controller's run

vorner commented 4 years ago

As I've mentioned, I've put something together. Would you like to have a look at the PR (#198) ‒ it's very much in progress, but the API usage should be visible. I'd polish all the docs & stuff later on.

amethyst / shred

Ideas for improving the Batch interface #197