eclipse-iceoryx / iceoryx

Eclipse iceoryx™ - true zero-copy inter-process-communication
https://iceoryx.io
Apache License 2.0
1.6k stars 373 forks source link

Deprecate posix timer #337

Open marthtz opened 3 years ago

marthtz commented 3 years ago

Brief feature description

Deprecate posix timer from iceoryx code base.

Detailed information

Posix timer has several issues (concurrency, truncation) and is pretty heavy weight and a bit slow. As iceorxy needs to be very responsive the posix timer should be removed and replaced with a lightweight stopwatch.

This will also close #190 as it'll be obsolete.

Tasks

elfenpiff commented 3 years ago

@marthtz I think a stopwatch is not exactly what we need. We have to rewrite the timer so that the timer does not work with callbacks anymore but instead with signals. If you look at the manpage https://man7.org/linux/man-pages/man2/timer_create.2.html we have to possible approaches:

I think we should explore the second option. Maybe it has similar drawbacks than the callback approach but if not it should be our way to go.

@elBoberido what do you think?

elBoberido commented 3 years ago

@elfenpiff @marthtz please don't call it stopwatch. A stopwatch is measuring time, this functionality is running towards a deadline or countdown. See also the equivalent int boost and Qt.

To the other points. The posix timer was started as a deadline timer and then enhanced with the posix calls. Currently it has both functionalities.

  1. cut out the deadline functionality and use that class where appropriate
  2. deprecate the current posix timer and replace the remaining occurrences with std::thread and sleep_for
  3. decide if we really need something like the posix timer. If we do, implement the second option from @elfenpiff which is also described in more detail in #170. But do this in an own class. Don't repeat the mistake and stuff it into the deadline functionality.
shankar-in commented 3 years ago

@elfenpiff I am currently working on this story. Based on the comments from @dkroenke I am working on to replace the timer.start() function with a simple thread sleep. Currently the posixtimer.start() is being called within the posh_runtime, service_discovery_notifier.

elBoberido commented 3 years ago

@shankar-bosch a simple sleep_for is not enough. The current timer invokes a thread on each timeout. You also have a start a thread where you use the sleep_for.

shankar-in commented 3 years ago

@elBoberido okay I will do so

elfenpiff commented 3 years ago

@shankar-bosch if you are working on an issue could you please assign it to you and add it into the iceoryx 1.0 board: https://github.com/eclipse/iceoryx/projects/2 so that everyone is aware of it.

shankar-in commented 3 years ago

@elfenpiff sure. I am not in the list of contributors. Could you pls add me? I can assign this issue in my name and bring this to the board.

shankar-in commented 3 years ago

@elBoberido I have clarifications. As you know the timers can be one shot or periodic based on the it_interval . If it is a one shot timer there can be a thread::sleepfor(it_value). if the timer is a periodic then there has to be a thread::sleepfor(it_value) followed by a thread::sleepfor(it_interval) until the application exit? I am not sure what to do here.

I also dont see the start() function within the timer.cpp invoking a thread after a timer is disarmed. Is it done in a different place? could you pls also help me here?

mossmaurice commented 3 years ago

@shankar-bosch if you are working on an issue could you please assign it to you and add it into the iceoryx 1.0 board: https://github.com/eclipse/iceoryx/projects/2 so that everyone is aware of it.

@elfenpiff This can only be done by an Eclipse committer. Eclipse committers are voted on in elections.

shankar-in commented 3 years ago

@mossmaurice Thank you

orecham commented 3 years ago

@shankar-bosch a simple sleep_for is not enough. The current timer invokes a thread on each timeout. You also have a start a thread where you use the sleep_for.

@elBoberido How about just removing the timer from iceoryx ? I would need to double check but the current use-cases we have within iceoryx can be resolved using a simple sleep_for (e.g. sending heartbeat).

elBoberido commented 3 years ago

@ithier that's the idea of this issue. There is an additional functionality in the timer (checking for a deadline) and that should to be preserved

elBoberido commented 3 years ago

@shankar-bosch the thread is started implicitly. Have a look at https://linux.die.net/man/2/timer_create, we use SIGEV_THREAD

If you use just sleep_for, the semantic of the code changes. In this case the thread which previously called start is now blocked, while it just continues running with the current timer implementation. You have to create a thread and use sleep_for in a loop. I would suggest to start with cutting out the functionality of the DeadlineTimer to cxx/deadline_timer.hpp/cpp. This should be way easier. You can do it by copying the current timer to cxx and removing everything from it which is related to the m_osTimer member

orecham commented 3 years ago

Here is where the timer is used in iceoryx:

From above, it looks like we only need to do add the heartbeat thread above and write some kind of helper method to wrap the usage of timer_gettime to satsify the second usecase. Then we can just mark the old timer as deprecated and we are done with this issue, right ?

elBoberido commented 3 years ago

it just occurred to me that sleep_for might be really bad for the test time, since it's not interruptible. How about creating a small wrapper around a semaphore with timed wait? e.g.

class Timer {
public:
    Timer() noexcept = default;
    ~Timer() noexcept; // calls stop
    cxx::expected<TimerError> wait(cxx::Duration) noexcept; // blocking wait on the semaphore
    void stop() noexcept; // sem_post to interrupt waiting
    static cxx::expected<units::Duration, TimerError> Timer::now() noexcept; // this is from the old Timer and also used in the codebase
}

Later on the semaphore could be replaced by POSIX timer_create(..) etc, if necessary. What do you think?

marthtz commented 3 years ago

Current PR #333 also makes use of the timer. Basically, it's checking for a timeout / interval expiry.

@elBoberido wrapper sounds good to me and could be used for this and @ithier first bullet point.

shankar-in commented 3 years ago

@elBoberido Thanks for the wrapper solution.

budrus commented 3 years ago

We should definitely fix this for 1.0, On ARM 64 I have failing tests because of #190 but I fear that's not the only problem

elBoberido commented 3 years ago

@budrus to fix #190 we need to refactor the units::Duration away from long double to chrono or timespec/timeval

budrus commented 3 years ago

@elBoberido That's at least half of the game. The failing DestructorIsBlocking could maybe be another thing. This also fails with larger numbers...

elBoberido commented 3 years ago

@budrus it could still be a rounding issue. AFAIK, the timer doesn't sleep for e.g. 100ms but for currentTime + 100ms. This might result in rounding errors with the long double, especially since that's only an alias to double on ARM and 80 bit wide on x86 ... except for MSVC where it is also an alias to double

elfenpiff commented 3 years ago

@budrus to fix #190 we need to refactor the units::Duration away from long double to chrono or timespec/timeval

@elBoberido @budrus I suggest here to use neither of them. I would create a struct which looks like timespec

struct duration_t {
     uint64_t seconds;
     uint32_t nanoseconds;
};

The reason is that a duration cannot be negative, you cannot wait -5 seconds but timespec/timeval can be negative. Additionally, timespec/timeval do not have a fixed size they are using the types long and time_t which can differ on different architectures.

And when we refactor units::Duration we should get rid of all the long double operators which are absolutely unnecessary - and I know I was the one who introduced them in the first place :/ .

shankar-in commented 3 years ago

I will take the solution for the truncation issue as suggested by @elfenpiff. Thank you. @budrus Is there a seperate ticket for failing DestructorIsBlocking issue?

budrus commented 3 years ago

@shankar-bosch No, as @elBoberido said, it could also have the same root cause as #190. We would have to analyze this but now things are changing anyway

shankar-in commented 3 years ago

@budrus Okay. I believe the issue #190 fix should be available for 1.0. So increasing the scope of #337 to fix #190 and analysing the failing DestructorIsBlocking should be fine?

elBoberido commented 3 years ago

since we are approaching the 0.90 release, is there any update on this?

shankar-in commented 3 years ago

Yes the PR is getting ready. Should be raised shortly.

elBoberido commented 3 years ago

@shankar-bosch for Deadline or keep-alive?

shankar-in commented 3 years ago

Deadline is ready. Keep alive is under testing. The PR should cover both.

elBoberido commented 3 years ago

@shankar-bosch do you think you can get this in until monday? If not, I could do a fast fix for the keep alive.

Just as reminder to keep the PRs small. The PR you are preparing could be split into two. The dealine timer and the keep alive functionality.

shankar-in commented 3 years ago

@elBoberido I have my changes ready. But I havent got the committer approval to open the PR yet. Yes you are right I can split the PR into two. shall I raise the PR for the deadline today by forking other repo? I shall also check this with my PO and update you.

elBoberido commented 3 years ago

@shankar-bosch yes, please. Would you mind if I create a small workaround for the keep-alive?

elBoberido commented 3 years ago

@shankar-bosch I've created a PeriodicTask class. Currently it wraps a semaphore. This semaphore could be replaced by a refactored posix::Timer. Ideally the timer doesn't spawn a thread, but does just block on wait. This would make the timer more flexible since it could also be used when we want to block on the current thread.

shankar-in commented 3 years ago

@elBoberido Thank you for the changes. I shall adopt the change.

elBoberido commented 3 years ago

@shankar-bosch with you DeadlineTimer PR slowly merging, there is only one remaining part to refactor. The old posix::Timer had three features:

I'd like to have the last one in a new posix::Timer, but based on a Semaphore, similar to the one in PeriodicTask but without spawning a new thread, just a blocking wait on e.g. a wait() method. This gives you the flexibility to use the timer in your current thread if you are allowed to block there, instead of spawning a new thread with PeriodicTask. I think of an API like this

class Timer {
    Timer(); // default c'tor
    Timer(units::Duration interval); // immediately starts the timer
    static cxx::expected<units::Duration, TimerError> now() noexcept;
    cxx::expected<TimerError> start(); // starts or restarts the timer with the current interval
    cxx::expected<TimerError> start(units::Duration interval); // starts or restarts the timer with the new interval
    cxx::expected<TimerError> stop(); // this is the only method that should be called from a different thread than the object was created
    cxx::expected<TimerEvent, TimerError> wait() noexcept; // blocking wait on the semaphore; with TimerEvent enum with the values `TICK` and `STOP`
};

I think the current RunMode can be omitted, since you can do it by calling wait() only once or repeatedly. I'm not sure about the CatchUpPolicy but I think it should also be a parameter of the constructors or maybe even better of the wait() method. @mossmaurice @MatthiasKillat what do you think?

If you have time and like the task, feel free to implement it. Ideally, one PR for the timer refactoring and one for the integration into PeriodicTask.

shankar-in commented 3 years ago

@shankar-bosch with you DeadlineTimer PR slowly merging, there is only one remaining part to refactor. The old posix::Timer had three features:

* DeadlineTimer -> refactored out

* spawning a thread and execute a task -> moved to PeriodicTask

* periodically trigger an event -> currently also in PeriodicTask with the Semaphore

I'd like to have the last one in a new posix::Timer, but based on a Semaphore, similar to the one in PeriodicTask but without spawning a new thread, just a blocking wait on e.g. a wait() method. This gives you the flexibility to use the timer in your current thread if you are allowed to block there, instead of spawning a new thread with PeriodicTask. I think of an API like this

class Timer {
    Timer(); // default c'tor
    Timer(units::Duration interval); // immediately starts the timer
    static cxx::expected<units::Duration, TimerError> now() noexcept;
    cxx::expected<TimerError> start(); // starts or restarts the timer with the current interval
    cxx::expected<TimerError> start(units::Duration interval); // starts or restarts the timer with the new interval
    cxx::expected<TimerError> stop(); // this is the only method that should be called from a different thread than the object was created
    cxx::expected<TimerEvent, TimerError> wait() noexcept; // blocking wait on the semaphore; with TimerEvent enum with the values `TICK` and `STOP`
};

I think the current RunMode can be omitted, since you can do it by calling wait() only once or repeatedly. I'm not sure about the CatchUpPolicy but I think it should also be a parameter of the constructors or maybe even better of the wait() method. @mossmaurice @MatthiasKillat what do you think?

If you have time and like the task, feel free to implement it. Ideally, one PR for the timer refactoring and one for the integration into PeriodicTask.

@elBoberido Sure. I am interested to take up. I have some points for clarifications. Shall i create a new story for this refactoring?

elBoberido commented 3 years ago

@shankar-in from my point of view, this issue could just be renamed to Refactor posix timer. In the end, that's what we did all the time

mossmaurice commented 3 years ago

Thanks @shankar-in and @elBoberido I think we already made good progress.

Throughout this post I'm referring not to master but the current branch of #490

* spawning a thread and execute a task -> moved to PeriodicTask

IMHO that's a great building block. I would keep the semaphore out of there. Could be renamed to PeriodicThread as it is just syntactic sugar as you already noted. The name "task" is probably the most über-undefined term ever in automotive software development. And probably elsewhere, too. Avoiding it could be wise. Use-case: Do something ASAP periodically, no need for the timing to be precise e.g. MQ/UDS handling of keep-alive messages.

* periodically trigger an event -> currently also in PeriodicTask with the Semaphore

For me that's a separate entity. Let's call it PeriodicEvent.

I'd like to have the last one in a new posix::Timer, but based on a Semaphore, similar to the one in PeriodicTask but without spawning a new thread, just a blocking wait on e.g. a wait() method. This gives you the flexibility to use the timer in your current thread if you are allowed to block there, instead of spawning a new thread with PeriodicTask. I think of an API like this

class Timer {
    Timer(); // default c'tor
    Timer(units::Duration interval); // immediately starts the timer
    static cxx::expected<units::Duration, TimerError> now() noexcept;
    cxx::expected<TimerError> start(); // starts or restarts the timer with the current interval
    cxx::expected<TimerError> start(units::Duration interval); // starts or restarts the timer with the new interval
    cxx::expected<TimerError> stop(); // this is the only method that should be called from a different thread than the object was created
    cxx::expected<TimerEvent, TimerError> wait() noexcept; // blocking wait on the semaphore; with TimerEvent enum with the values `TICK` and `STOP`
};

Let's recap a bit...

Lessons learned:

I like your proposal, it keeps things simple. However, I'm unsure on whether a plain wait() will be user friendly enough. The idea of the legacy posix::Timer was to make it user-friendly by storing the callable inside the posix::Timer. #473 might come in handy here.

Around mid-2020 as a possible alternative to get rid of the too-many-thread-spawning-OS issue was discussed:

I fear we might need at kernel-space solution at some point and a user-space solution won't be feasible e.g. to fulfill certain timing constraints on QNX. IIRC QNX advised to use the timer_create API. I also have the feeling that all CatchupPolicys should be supported by the new constructs.

@mossmaurice @MatthiasKillat what do you think?

Answering with a question: What would be the use-case for the PeriodicEvent?

elBoberido commented 3 years ago

@mossmaurice

Thanks @shankar-in and @elBoberido I think we already made good progress.

Throughout this post I'm referring not to master but the current branch of #490

* spawning a thread and execute a task -> moved to PeriodicTask

IMHO that's a great building block. I would keep the semaphore out of there. Could be renamed to PeriodicThread as it is just syntactic sugar as you already noted. The name "task" is probably the most über-undefined term ever in automotive software development. And probably elsewhere, too. Avoiding it could be wise. Use-case: Do something ASAP periodically, no need for the timing to be precise e.g. MQ/UDS handling of keep-alive messages.

I used task because it's also used in the STL https://en.cppreference.com/w/cpp/thread/packaged_task. I'm also not quite fond of PeriodicThread since that's not what it does. It's more like a PeriodicExecutor.

* periodically trigger an event -> currently also in PeriodicTask with the Semaphore

For me that's a separate entity. Let's call it PeriodicEvent.

I'd like to have the last one in a new posix::Timer, but based on a Semaphore, similar to the one in PeriodicTask but without spawning a new thread, just a blocking wait on e.g. a wait() method. This gives you the flexibility to use the timer in your current thread if you are allowed to block there, instead of spawning a new thread with PeriodicTask. I think of an API like this

class Timer {
    Timer(); // default c'tor
    Timer(units::Duration interval); // immediately starts the timer
    static cxx::expected<units::Duration, TimerError> now() noexcept;
    cxx::expected<TimerError> start(); // starts or restarts the timer with the current interval
    cxx::expected<TimerError> start(units::Duration interval); // starts or restarts the timer with the new interval
    cxx::expected<TimerError> stop(); // this is the only method that should be called from a different thread than the object was created
    cxx::expected<TimerEvent, TimerError> wait() noexcept; // blocking wait on the semaphore; with TimerEvent enum with the values `TICK` and `STOP`
};

Let's recap a bit...

Lessons learned:

* `posix::timer` was waaaay to complex and contained different kind of constructs in one

  * Letting the OS start a thread was a bad idea, as in some occasions thousands of threads were spawned

* `restart()` is unnecessary

I like your proposal, it keeps things simple. However, I'm unsure on whether a plain wait() will be user friendly enough. The idea of the legacy posix::Timer was to make it user-friendly by storing the callable inside the posix::Timer. #473 might come in handy here.

Well, it's just a building block. If you need the additional functionality use the PeriodicTask, which uses this building block. Fun fact, the PeriodicTask can store objects overloading the call operator ;)

Around mid-2020 as a possible alternative to get rid of the too-many-thread-spawning-OS issue was discussed:

* Stick with [the API](https://man7.org/linux/man-pages/man2/timer_create.2.html)

  * `struct sigevent` has a member `sigev_notify` and could be set to `SIGEV_SIGNAL`
  * Don't use `SIGEV_THREAD`
  * OS then creates the "Event" an signals the process

* Benefit: Timing is controlled by OS

I fear we might need at kernel-space solution at some point and a user-space solution won't be feasible e.g. to fulfill certain timing constraints on QNX. IIRC QNX advised to use the timer_create API. I also have the feeling that all CatchupPolicys should be supported by the new constructs.

The timer would currently use the semaphore, but that's just an implementation detail and can be changed in the future with POSIX timer_create functionality. I'm also for the CatchupPolicy and would pass it in the wait method. This has the benefit to define it when it's necessary, while waiting for the next timer tick.

@mossmaurice @MatthiasKillat what do you think?

Answering with a question: What would be the use-case for the PeriodicEvent?

For me an event is something active, and is fired no matter what. The proposed timer is completely passive since it just blocks on wait. Something like a PeriodicEvent could use the timer as building block though. That would also be the use case for the refactored timer, a building block for PeriodicTask. One could ask if that justifies a new class. I would say yes, since it nicely encapsulates the functionality and makes it available for cases where you already have a thread, need to block for a certain time and want to be able to interrupt the blocking call.

shankar-in commented 3 years ago

@elBoberido I am taking up refactoring of periodically trigger an event in this sprint. In case i have clarifications I will contact you in Gitter.

shankar-in commented 3 years ago

@elBoberido the abstimeout calculation for the semaphore is present within the semaphore.cpp. this makes the choice of clock unavailable for the the periodic timer. currently the semaphore always makes use of the system clock. For periodic timer I believe monotonic would be a right choice? In that case we have to move the calculation of the abstimeout outside the semaphore ? pls feedback

elfenpiff commented 3 years ago

@shankar-in The implementation is absolutely correct here! We do not want to do this outside since it is error prone especially when the developer does not read the documentation of the posix semaphore. Here the part from the sem_timedwait manpage:

sem_timedwait() is the same as sem_wait(), except that abs_timeout specifies a limit on the amount of time that the call should block if the decrement cannot be immediately per‐ formed. The abs_timeout argument points to a structure that specifies an absolute timeout in seconds and nanoseconds since the Epoch, 1970-01-01 00:00:00 +0000 (UTC). This struc‐ ture is defined as follows:

       struct timespec {
           time_t tv_sec;      /* Seconds */
           long   tv_nsec;     /* Nanoseconds [0 .. 999999999] */
       };

As you see we require here the unix time which is monotonic since it counts the seconds from 1970-01-01 (Timezone: UTC).

shankar-in commented 3 years ago

@elfenpiff Thanks.

CLOCK_REALTIME returns the Epoch based on the system time clock. So I think this can be modified which can lead to change in the time since epoch.

CLOCK_MONOTONIC doesnt depend on the system clock and cannot be set. So this value cannot be modified.

So isnt the Monotonic optimal for our periodic timer use case? pls feedback.

elBoberido commented 3 years ago

@shankar-in that depends. If you use the semaphore for the blocking wait then you have no choice and must use CLOCK_REALTIME. We cannot always decide which clock to use, sometimes the API we use takes this decision for us. So every time you use a POSIX API which takes a timespec or timeval you have to check which clock the API expects like elfenpiff already mentioned with https://linux.die.net/man/3/sem_timedwait

shankar-in commented 3 years ago

@elBoberido Okay. I am making use of the api sem_timedwait and let the API decide which would be the best clock for the purpose. :)

shankar-in commented 3 years ago

@elBoberido I have a clarification.

Pls find the impl of wait of the periodic / keep alive timer below,

cxx::expected<iox::cxx::TimerEvent, posix::SemaphoreError> PeriodicTimer::wait() noexcept
{
    if (*(m_stop.getValue()) == static_cast<int>(posix::SemaphoreWaitState::TIMEOUT)) 
    {
        auto waitResult = m_stop.timedWait(m_interval, true); 
        if (waitResult.has_error())
        {
            return cxx::error<posix::SemaphoreError>(waitResult.get_error());
        }
        else
        {
            return cxx::success<iox::cxx::TimerEvent>(iox::cxx::TimerEvent::TICK);
        }
    }
    else
    {
        return cxx::success<iox::cxx::TimerEvent>(iox::cxx::TimerEvent::STOP);
    }
}

The clarification I have is. The periodic timer will be manipulated by two different threads 1. Periodic Task 2. call back function thread. The periodic timer start() / stop() functions will be called by the periodic task. The callback function thread will call the wait() on the periodic timer. Now Lets assume that Callback function thread is in wait() and blocked for the duration. Meanwhile if the periodic task thread stops() it releases the semaphore then the wait() method acquires the semaphore immediately and keeps the timer Ticking.

So to avoid this concurrency issue should we introduce a mutex between the threads to update or read the semaphore state?

Pls feedback

shankar-in commented 3 years ago

@elBoberido Trying a work around. If the sem_ timedwait() didnt complete the wait duration then we can conveniently assume that semaphore is obtained [sem_post() happened in backgorund]. In that case we can release the semaphore and return back STOP. Hope this works.

elBoberido commented 3 years ago

@shankar-in it's hard to tell without all the code. Please create a draft PR for further discussion. Basically you have to store the stop state in an atomic and always return with STOP if the flag is set.

Also, could you please name the class just Timer? This makes it easier to extend it with methods like singleShot which wouldn't be periodic but needs manual restarting.

shankar-in commented 3 years ago

@elBoberido Okay I will raise the draft PR

elBoberido commented 3 years ago

@shankar-in just for information, since we are targeting Thursday for our 1.0RC1 release. It might take some time until non release critical PRs are reviewed.