asynchronics / asynchronix

High-performance asynchronous computation framework for system simulation
Apache License 2.0
166 stars 8 forks source link

Providing an example with network integration and multithreading #13

Open robamu opened 5 months ago

robamu commented 5 months ago

Hi!

I am not sure where to best ask this question, I hope this is the right place :) Thanks for providing this excellent framework. I am working on a mini-simulator for an example satellite on-board software which can be run on a host system. The goal is to provide an environment which more closely resembles a real satellite system directly on a host computer.

My goal is still to have the OBSW application and the simulator as two distinct applications which communicate through a UDP interface. So far, I have developed a basic model containing a few example devices. The simulator is driven in a dedicated thread which calls simu.step permanently, while the UDP server handling is done inside a separate thread. One problem I now have: How do I model deferred reply handling in a system like this? There will probably be some devices where I need to drive the simulator, wait for a certain time, and then send some output back via the UDP server.

I already figured out that I probably have to separate the request/reply handling from the UDP server completely by using messaging., and that would probably be a good idea from an architectural point of view. Considering that the reply handling still has to be a fast as possible to simulate devices as best as possible, I was thinking of the following solution to leverage the asynchronix features:

  1. Providing a reply handler which explicitly handles all requests sent from a UDP server and which is scheduled separately to handle incoming UDP requests as fast as possible.
  2. This reply handler then drives the simulator model (or simply asks for a reply ,depending on the request) by sending the respective events depending on the Request.
  3. The models themselves send a replies directly to a dedicated UDP TM Handler. Some models might send the reply ASAP, others can schedule the reply in the future, depending on the requirements.

What do you think about the general approach? I think an example application showcasing some sort of network integration and multi-threading might be useful in general. If this approach works well and you think this is a good idea, I could also try to provide an example application.

Repository where this is developed: https://egit.irs.uni-stuttgart.de/rust/sat-rs/src/branch/lets-get-this-minsim-started/satrs-minisim/src/main.rs

robamu commented 5 months ago

I just saw that I can probably also use Request and Reply ports for this purpose. I'll have a closer look at that.

robamu commented 5 months ago

After a bit more digging, I figured out that I actually need real-time simulation here, which I now managed to do with the SystemClock as the relevant simulation timer. However, I am still struggling on how to get those requests to drive the simulation.

What I now basically have is a SimController object which receives all requests and should convert them into input function calls on the Simulation object. However, this means the controller has to own the simulation object itself and also run the step functions, which might block. I am not sure how to solve this yet. There are now requests which might arrive at an arbitrary time and need to be handled as fast as possible. I also still have the regular simulation tasks like self-scheduling sensors which update their internal values each X milliseconds which need to run even when no requests are arriving. Is that maybe something where the planned RPC feature might help?

sbarral commented 5 months ago

[just clarifying for readers not versed in the space industry jargon: OBSW = on-board computer SW]

Yes, this is definitely the right place to discuss this topic :)

It is also the right moment since we are actively working on v0.3, which will among others provide some RPC mechanism to drive the simulator remotely (e.g. from a Python script). But the co-simulation use-case you outline is something we want to support ASAP so it will be prioritized for v0.3 too.

Many thanks for your detailed report and thoughts on the topic, this is very helpful. And admitedly, the main reason an example is missing is simply that doing co-simulation through sockets or similar is a bit awkward at the moment. We will take a bit more time to reflect on your report, but here is a raw brain dump of my personnal thoughts so you can let me know if I correctly understand the problem space or if I am off the mark:

1) the approach you propose looks like the right one to me, at least in the current state of affairs; having your SimController own the Simulation is probably the only way at the moment, but you still can run them in separate threads and have them communicate using e.g. channels or other inter-thread communication primitive;

2) where it becomes tricky is that when communicating from a model to the UDP handler, blocking calls should be strictly avoided as these would stall the executor thread altogether; it should still "work" if the simulator operates with a thread pool larger than the number of simultaneously blocked models, but this is not optimal (note that by default, the number of simulator threads is the number of logical cores);

3) so ideally, the right way to handle blocking calls from models would be to use async channels or similar async primitives in the model, with for instance a blocking receiver on the UDP thread (using e.g. something like block_on from the futures_executor or pollster crates); possibly, the UDP thread could then send back a reply directly to the model via e.g. an async oneshot channel which Sender was sent to the UDP thread together with the request, and the model could forward the UDP reply to other models, either as the value returned from a replier method or by broadcasting the event with an Output::send() call;

4) ...but unfortunately this does not work at the moment: if the UDP reply arrives only after all models are paused because they can't make more progress, the executor will detect that all threads are idle and the Simulator::step() call will return before the UDP reply is handled by the model.

The 2 workarounds I can see at the moment are:

None of these are great, so we need to explore more elegant solutions. One of them would be to have some kind of "reactor", like in general-purpose async executors, but hopefully we can avoid baking into the API the transport protocol and just have a general-purpose hook for blocking calls.

Regarding real-time execution and the self-scheduling sensors, I do not understand at the moment how these interfere with the above issues. I would expect self-scheduling models to work irrespective of real-time and/or co-simulation, but I am probably missing something.

Please let me know if the above does not cover your use case, and in any case, while we brainstorm on this, please don't hesitate to offer comments/suggestions using this issue (or via DM).

robamu commented 5 months ago

Thanks for the quick reply :+1: I think I can find a way around blocking API for the models specifically. Maybe this helps a bit (I omitted internal simulation details like the interconnection of models here, and only included data flow from a client perspective + execution model):

minisim-arch

My plan was to use non-blocking sender components (e.g. mpsc::Sender<SimReply>) inside the individual models to send the replies without blocking the simulator. The UDP TM handling wouldn't have to feed anything back into the simulation in that case.

The TC handler and TM sender live in dedicated threads and use blocking API to either poll UDP frames or send back SimReplies. The third thread is where I need to be careful with blocking API. I have a SimRequest receiver (probably mpsc::Receiver<SimRequest>) in the sim controller, which I can poll in a non-blocking manner. However, each time I call step, wouldn't I block for a certain time duration in which a new request might arrive?

sbarral commented 5 months ago

Oh yes, so this is exactly what I had in mind in the second suggested workaround above.

Indeed the call to Simulation::step will block, but since the channel for SimRequest is buffered, I wouldn't think this is a problem. However, Simulation::step may increment simulation time, which is not desirable if the next SimRequest awaiting in the channel is still for the previous time stamp. In that case, I would think that a working strategy would be:

Would that work for you?

robamu commented 5 months ago

That sounds like a good approach. The thing remaining that I am still interested in solving is minimizing the delay time for request handling if that is possible. Maybe I am missing a detail here. For example, I have a simulation with no request but which is still running because of some self-scheduling events, for example every 20 or 50 milliseconds (assuming system clock here). Now, if one requests arrives just 1 ms after a step call, wouldn't that introduce a delay for the rest of the time until the next event is handled? I would need something like an interrupt, where an additional event is scheduled in the middle of a step. Or I could do something like checking the request queue at least every millisecond by using step_by, which would of course go into the direction of polling (possibly very expensive?). Is there maybe some other way?

I am probably never going to get the real timing performance for something like a SPI interface which works synchronously and full-duplex, but if I want to poll some simulated sensor with a frequency of 30-50ms , the delay I mentioned above might be problematic. Getting as close as possible would probably suffice though :)

sbarral commented 5 months ago

Sorry, you are right, for real-time execution this strategy does not work.

The method I am familiar with for real-time co-simulation is pretty much what you proposed: the simulator runs with a dummy clock (the default NoClock) but is polled at regular intervals Δt. In industrial Hardware-in-the-Loop spacecraft testing, I have seen various values of Δt ranging from ~10ms and ~30ms. AFAIK this was a trade-off between the time accuracy required to resolve AOC and the Hardware and network latency. But for a purely software bench, you can certainly go below: polling the simulator is very cheap until there actually is something in the scheduler queue, so even >1000Hz polling is no problem.

That being said, you may need to account for network latency: you would probably want to wait a couple of milliseconds after the theoretical wall clock t0 before you call Simulation::step_until(t0), failing which you may miss some UDP packets with timestamps t ≤ t0.

For synchronization, you could use a SystemClock in the user event loop while keeping the default NoClock when configuring the Simulation. The user event loop could for instance look like this:

1) call Simulation::step_until(t), 2) increment time, t ← t+Δt, 3) call SystemClock::synchronize(t + max_network_latency) to give enough time to receive all UDP events (SimRequests) with timestamps up to t, 4) schedule all received SimRequests according to their respective timestamps or, for the sake of simplicity and computational efficiency, schedule all events that predate t at the same step, for instance at t, t - Δt or t - Δt/2, 5) go to 1.

robamu commented 5 months ago

That sounds like an excellent approach. I like the idea of using the default NoClock to keep the simulation as fast as possible and moving the delay handling to the user code completely. Some remaining questions:

  1. A little bit of clarification about the network latency: This approach would "hide" the latency from the models, right? So, the SystemClock waits 60-70 ms to incorporate the network latency, even though the simulation time only advances 50 ms (10 - 20 ms max network latency).
  2. With timestamps for SimRequest, do you mean the arrival time, or some other time associated with the request (or maybe both?)? I'd probably start with the first case because the first purpose of the request will be used for is communication with the simulated system (so, the timestamp is basically the arrival time). I can think of some cases where an associated delay might be included in the future, for example something like fault injection.
  3. Just a side thought: If my simulation model becomes dependent on absolute time (e.g. magnetic field models for earth, sun/dusk times etc.), wouldn't the approach with two different clocks, which synchronize differently, make them go out of sync? I suppose the NoClock is then there for determining time steps/differences only, and I pass &SystemClock or an explicit current absolute timestamp to that respective model for calculations?

I think the discussion might be very useful for other users, which is why I am asking all my questions here. I think a dedicated docs segment for the use-case I have here (generic co-simulation of a [satellite] system, and all problems/requirements associated with it) might be an excellent idea for the next release :)

sbarral commented 5 months ago

Actually in that approach the simulation clock (returned by Simulation::time()) should stay in sync with the wall clock with an error below Δt, so there should be no drift between the 2 clocks. Here is how it should work, the principle is more or less the same for co-simulation or hardware-in-the-loop:

rt_cosim

Therefore:

  1. The SystemClock would wait typically only Δt minus whatever time it takes to complete a call to Simulation::step_until(t).

  2. The timestamps would be assigned by the co-simulator (in your case, the on-board computer) or a hardware device (for HiL). In HiL this would be the wall-clock time at which the event was generated on the hardware, while in co-simulation this would be the simulation time of the co-simulator when the event was generated (I assume here that the clocks of the 2 simulators are synchronized to a common time source). As you mentioned, the co-simulator could of course also send events with time stamps set in the future to schedule a future event.

  3. in theory, the simulator clock should stay in sync with the wall clock time so I would just use Simulation::time() unless the small offset between the 2 matters (which is rather unlikely).

I realized as well that you do not need to care for UDP packet ordering here: the events will be inserted in the priority queue according to their time stamps anyway.

sbarral commented 5 months ago

I realized there are still issues with this strategy, but I think I am getting close to a solution. Please bear with me, I will try to find some time today to write a follow-up.

robamu commented 5 months ago

Thanks for the detailed explanation in any case. I still need to wrap around my head how time works with these systems, and that definitely helps a lot. I actually am not even sure whether the network delay is that large on the same system. I will probably do some measurements to check that.

sbarral commented 5 months ago

Note that if you ignore latency, then you would probably want to stamp the UDP events with a counter. Otherwise, if two events A and B are sent in this order but only B arrives before the deadline while A doesn't, then B will be processed before A, which may break some causal relation in the simulation.

In any case, here are some issues which I identified in the algorithm of my previous comments:

I would therefore modify the event loop as follows:

  1. call Simulation::step_until(t+Δt) (rather than Simulation::step_until(t)),
  2. increment time, t ← t+Δt,
  3. call SystemClock::synchronize(t + max_network_latency) to give enough time to receive at least all UDP events with timestamps up to t,
  4. process sequentially all received events with a timestamp less than t using Simulation::send_event(), respecting the order of the timestamps to preserve causality,
  5. go to 1.

Here is an updated schematic, which illustrates as well a case of event re-ordering due to UDP (events t2 and t3) and an event (t5) that arrives before the deadline but with a timestamp corresponding to the next time slice.

rt_cosim2

I have other ideas about how to implement this algorithm in a more elegant and more computationally efficient manner, but I think there is already enough to unpack in this comment :)

robamu commented 5 months ago

Thanks. For this algorithm this probably means:

  1. I have to add a timestamp to my SimRequest so it can be determined whether the command needs to be assigned to the next time slice
  2. Add a counter to SimRequest to preserve the order of sim request processing

The algorithm currently looks like this then:

    pub fn run(&mut self, start_time: MonotonicTime, udp_polling_interval_ms: u64) {
        let mut t = start_time + Duration::from_millis(udp_polling_interval_ms);
        self.sys_clock.synchronize(t);
        loop {
            // Check for UDP requests every millisecond. Shift the simulator ahead here to prevent
            // replies lying in the past.
            t += Duration::from_millis(udp_polling_interval_ms);
            self.simulation
                .step_until(t)
                .expect("simulation step failed");
            self.handle_sim_requests();

            self.sys_clock.synchronize(t);
        }
    }

Not sure whether the bit before loop is still necessary.. One more question about clock synchronization: You mentioned that both the co-simulator and the simulator should have the same clock source which will probably be the wall clock for me. The easiest solution probably would be to just use SystemClock inside the co-simulator as well?

sbarral commented 5 months ago

Sorry! I completely missed your reply.

I am assuming that handle_sim_requests() is a method that unpacks the SimRequests from the queue and processes them with Simulation::send_event. If so then it should be placed rather after the call to self.sys_clock.synchronize().

Here is more or less how I would implement this:

/// For simplicity, it is assumed that requests are already ordered by the
/// channel sender side (Sender<SimRequest>) using e.g. a request counter.
/// Otherwise, requests retrieved from `receiver` must be first channeled
/// through a priority queue using the counter for ordering.
pub fn run(
    &mut self,
    start_time: MonotonicTime,
    receiver: &mut Receiver<SimRequest>, // some channel receiver
    udp_polling_interval_ms: u64,
) {
    let mut t = start_time;

    // Whenever a request for the next time slice is received, store it
    // temporarily in this buffer.
    let mut outstanding_request = None;

    loop {
        let t_old = t;
        t += Duration::from_millis(udp_polling_interval_ms);

        // Wait for a little longer than `t` to  ensure that at least all
        // requests that were sent before `t` are received.
        self.sys_clock.synchronize(t + MAX_LATENCY);

        while let Some(mut sim_request) =
            outstanding_request.or_else(|| receiver.recv())
        {
            if sim_request.timestamp > t {
                // Keep this request for the next time slice.
                outstanding_request = Some(sim_request);
                break;
            }
            assert!(sim_request.timestamp > t_old, "stale data received");

            //
            // ...call `self.simulation.send_event(...)` using the data from `sim_request``...
            //
        }

        self.simulation
            .step_until(t)
            .expect("simulation step failed");
    }
}

Using SystemClock in the co-simulator sounds like a good idea. Maybe you could just copy the code to avoid pulling such a large dependency.

robamu commented 5 months ago

Sounds good. About the large dependency: Maybe the module system of Rust can be used smartly here? One way would be to extract time components into a separate asynchronix-time crate, which then is re-exported by default inside the main crate. That way, I can simply use the time crate only in my example application instead of the full dependency. Another way I see at the tokio crate: Use a time feature: https://github.com/tokio-rs/tokio/blob/master/tokio/Cargo.toml .

sbarral commented 5 months ago

I am not sure I understand the tokio idea: wouldn't this only make sense if there is also a core feature to "optionally" import everything else than time? It would be a bit weird to have the core of Asynchronix as an optional part of the crate.

Most of the stuff in the time module only really makes sense as part of Asynchronix, but maybe MonotonicTime could be factored out into an independent crate (there are already a couple such crates like st3 and diatomic-waker). At the moment it is coupled to the private TearableAtomic trait, but this could be worked around I guess. Feel free to open a separate issue if you feel it's worth it.

I am reluctant to move the Clocks to their own crate as these are strongly idiosyncratic, but the implementation of SystemClock is fairly small so if MonotonicTime was available as a crate, there wouldn't be that much to copy.

robamu commented 5 months ago

You're right, the tokio approach might not be best here. That is probably what would be required (core could be a default feature, but it is still a bit weird).

I think this is a separate issue though, I might open a new one. I still think a crate modularization can make sense if some components could be useful for applications interfacing with an asychronix application, if those components don't have a lot of dependencies on many other asynchronix components.