Open marthtz opened 3 years ago
@marthtz I think a stopwatch is not exactly what we need. We have to rewrite the timer so that the timer does not work with callbacks anymore but instead with signals. If you look at the manpage https://man7.org/linux/man-pages/man2/timer_create.2.html we have to possible approaches:
I think we should explore the second option. Maybe it has similar drawbacks than the callback approach but if not it should be our way to go.
@elBoberido what do you think?
@elfenpiff @marthtz please don't call it stopwatch. A stopwatch is measuring time, this functionality is running towards a deadline or countdown. See also the equivalent int boost and Qt.
To the other points. The posix timer was started as a deadline timer and then enhanced with the posix calls. Currently it has both functionalities.
std::thread
and sleep_for
@elfenpiff I am currently working on this story. Based on the comments from @dkroenke I am working on to replace the timer.start() function with a simple thread sleep. Currently the posixtimer.start() is being called within the posh_runtime, service_discovery_notifier.
@shankar-bosch a simple sleep_for
is not enough. The current timer invokes a thread on each timeout. You also have a start a thread where you use the sleep_for
.
@elBoberido okay I will do so
@shankar-bosch if you are working on an issue could you please assign it to you and add it into the iceoryx 1.0 board: https://github.com/eclipse/iceoryx/projects/2 so that everyone is aware of it.
@elfenpiff sure. I am not in the list of contributors. Could you pls add me? I can assign this issue in my name and bring this to the board.
@elBoberido I have clarifications. As you know the timers can be one shot or periodic based on the it_interval .
If it is a one shot timer there can be a thread::sleepfor(it_value)
.
if the timer is a periodic then there has to be a thread::sleepfor(it_value)
followed by a thread::sleepfor(it_interval)
until the application exit? I am not sure what to do here.
I also dont see the start()
function within the timer.cpp invoking a thread after a timer is disarmed. Is it done in a different place? could you pls also help me here?
@shankar-bosch if you are working on an issue could you please assign it to you and add it into the iceoryx 1.0 board: https://github.com/eclipse/iceoryx/projects/2 so that everyone is aware of it.
@elfenpiff This can only be done by an Eclipse committer. Eclipse committers are voted on in elections.
@mossmaurice Thank you
@shankar-bosch a simple
sleep_for
is not enough. The current timer invokes a thread on each timeout. You also have a start a thread where you use thesleep_for
.
@elBoberido How about just removing the timer from iceoryx ? I would need to double check but the current use-cases we have within iceoryx can be resolved using a simple sleep_for (e.g. sending heartbeat).
@ithier that's the idea of this issue. There is an additional functionality in the timer (checking for a deadline) and that should to be preserved
@shankar-bosch the thread is started implicitly. Have a look at https://linux.die.net/man/2/timer_create, we use SIGEV_THREAD
If you use just sleep_for
, the semantic of the code changes. In this case the thread which previously called start
is now blocked, while it just continues running with the current timer implementation. You have to create a thread and use sleep_for
in a loop. I would suggest to start with cutting out the functionality of the DeadlineTimer
to cxx/deadline_timer.hpp/cpp
. This should be way easier. You can do it by copying the current timer to cxx
and removing everything from it which is related to the m_osTimer
member
Here is where the timer is used in iceoryx:
From above, it looks like we only need to do add the heartbeat thread above and write some kind of helper method to wrap the usage of timer_gettime
to satsify the second usecase.
Then we can just mark the old timer as deprecated and we are done with this issue, right ?
it just occurred to me that sleep_for might be really bad for the test time, since it's not interruptible. How about creating a small wrapper around a semaphore with timed wait? e.g.
class Timer {
public:
Timer() noexcept = default;
~Timer() noexcept; // calls stop
cxx::expected<TimerError> wait(cxx::Duration) noexcept; // blocking wait on the semaphore
void stop() noexcept; // sem_post to interrupt waiting
static cxx::expected<units::Duration, TimerError> Timer::now() noexcept; // this is from the old Timer and also used in the codebase
}
Later on the semaphore could be replaced by POSIX timer_create(..)
etc, if necessary.
What do you think?
Current PR #333 also makes use of the timer. Basically, it's checking for a timeout / interval expiry.
@elBoberido wrapper sounds good to me and could be used for this and @ithier first bullet point.
@elBoberido Thanks for the wrapper solution.
We should definitely fix this for 1.0, On ARM 64 I have failing tests because of #190 but I fear that's not the only problem
@budrus to fix #190 we need to refactor the units::Duration away from long double
to chrono
or timespec/timeval
@elBoberido That's at least half of the game. The failing DestructorIsBlocking
could maybe be another thing. This also fails with larger numbers...
@budrus it could still be a rounding issue. AFAIK, the timer doesn't sleep for e.g. 100ms but for currentTime + 100ms
. This might result in rounding errors with the long double
, especially since that's only an alias to double
on ARM and 80 bit wide on x86 ... except for MSVC where it is also an alias to double
@budrus to fix #190 we need to refactor the units::Duration away from
long double
tochrono
ortimespec/timeval
@elBoberido @budrus I suggest here to use neither of them. I would create a struct which looks like timespec
struct duration_t {
uint64_t seconds;
uint32_t nanoseconds;
};
The reason is that a duration cannot be negative, you cannot wait -5 seconds but timespec/timeval
can be negative. Additionally, timespec/timeval
do not have a fixed size they are using the types long
and time_t
which can differ on different architectures.
And when we refactor units::Duration
we should get rid of all the long double
operators which are absolutely unnecessary - and I know I was the one who introduced them in the first place :/ .
I will take the solution for the truncation issue as suggested by @elfenpiff. Thank you. @budrus Is there a seperate ticket for failing DestructorIsBlocking issue?
@shankar-bosch No, as @elBoberido said, it could also have the same root cause as #190. We would have to analyze this but now things are changing anyway
@budrus Okay. I believe the issue #190 fix should be available for 1.0. So increasing the scope of #337 to fix #190 and analysing the failing DestructorIsBlocking should be fine?
since we are approaching the 0.90 release, is there any update on this?
Yes the PR is getting ready. Should be raised shortly.
@shankar-bosch for Deadline or keep-alive?
Deadline is ready. Keep alive is under testing. The PR should cover both.
@shankar-bosch do you think you can get this in until monday? If not, I could do a fast fix for the keep alive.
Just as reminder to keep the PRs small. The PR you are preparing could be split into two. The dealine timer and the keep alive functionality.
@elBoberido I have my changes ready. But I havent got the committer approval to open the PR yet. Yes you are right I can split the PR into two. shall I raise the PR for the deadline today by forking other repo? I shall also check this with my PO and update you.
@shankar-bosch yes, please. Would you mind if I create a small workaround for the keep-alive?
@shankar-bosch I've created a PeriodicTask
class. Currently it wraps a semaphore. This semaphore could be replaced by a refactored posix::Timer. Ideally the timer doesn't spawn a thread, but does just block on wait. This would make the timer more flexible since it could also be used when we want to block on the current thread.
@elBoberido Thank you for the changes. I shall adopt the change.
@shankar-bosch with you DeadlineTimer
PR slowly merging, there is only one remaining part to refactor.
The old posix::Timer had three features:
I'd like to have the last one in a new posix::Timer, but based on a Semaphore, similar to the one in PeriodicTask but without spawning a new thread, just a blocking wait on e.g. a wait()
method. This gives you the flexibility to use the timer in your current thread if you are allowed to block there, instead of spawning a new thread with PeriodicTask. I think of an API like this
class Timer {
Timer(); // default c'tor
Timer(units::Duration interval); // immediately starts the timer
static cxx::expected<units::Duration, TimerError> now() noexcept;
cxx::expected<TimerError> start(); // starts or restarts the timer with the current interval
cxx::expected<TimerError> start(units::Duration interval); // starts or restarts the timer with the new interval
cxx::expected<TimerError> stop(); // this is the only method that should be called from a different thread than the object was created
cxx::expected<TimerEvent, TimerError> wait() noexcept; // blocking wait on the semaphore; with TimerEvent enum with the values `TICK` and `STOP`
};
I think the current RunMode
can be omitted, since you can do it by calling wait()
only once or repeatedly. I'm not sure about the CatchUpPolicy
but I think it should also be a parameter of the constructors or maybe even better of the wait()
method.
@mossmaurice @MatthiasKillat what do you think?
If you have time and like the task, feel free to implement it. Ideally, one PR for the timer refactoring and one for the integration into PeriodicTask
.
@shankar-bosch with you
DeadlineTimer
PR slowly merging, there is only one remaining part to refactor. The old posix::Timer had three features:* DeadlineTimer -> refactored out * spawning a thread and execute a task -> moved to PeriodicTask * periodically trigger an event -> currently also in PeriodicTask with the Semaphore
I'd like to have the last one in a new posix::Timer, but based on a Semaphore, similar to the one in PeriodicTask but without spawning a new thread, just a blocking wait on e.g. a
wait()
method. This gives you the flexibility to use the timer in your current thread if you are allowed to block there, instead of spawning a new thread with PeriodicTask. I think of an API like thisclass Timer { Timer(); // default c'tor Timer(units::Duration interval); // immediately starts the timer static cxx::expected<units::Duration, TimerError> now() noexcept; cxx::expected<TimerError> start(); // starts or restarts the timer with the current interval cxx::expected<TimerError> start(units::Duration interval); // starts or restarts the timer with the new interval cxx::expected<TimerError> stop(); // this is the only method that should be called from a different thread than the object was created cxx::expected<TimerEvent, TimerError> wait() noexcept; // blocking wait on the semaphore; with TimerEvent enum with the values `TICK` and `STOP` };
I think the current
RunMode
can be omitted, since you can do it by callingwait()
only once or repeatedly. I'm not sure about theCatchUpPolicy
but I think it should also be a parameter of the constructors or maybe even better of thewait()
method. @mossmaurice @MatthiasKillat what do you think?If you have time and like the task, feel free to implement it. Ideally, one PR for the timer refactoring and one for the integration into
PeriodicTask
.
@elBoberido Sure. I am interested to take up. I have some points for clarifications. Shall i create a new story for this refactoring?
@shankar-in from my point of view, this issue could just be renamed to Refactor posix timer
. In the end, that's what we did all the time
Thanks @shankar-in and @elBoberido I think we already made good progress.
Throughout this post I'm referring not to master but the current branch of #490
* spawning a thread and execute a task -> moved to PeriodicTask
IMHO that's a great building block. I would keep the semaphore out of there. Could be renamed to PeriodicThread
as it is just syntactic sugar as you already noted. The name "task" is probably the most über-undefined term ever in automotive software development. And probably elsewhere, too. Avoiding it could be wise.
Use-case: Do something ASAP periodically, no need for the timing to be precise e.g. MQ/UDS handling of keep-alive messages.
* periodically trigger an event -> currently also in PeriodicTask with the Semaphore
For me that's a separate entity. Let's call it PeriodicEvent
.
I'd like to have the last one in a new posix::Timer, but based on a Semaphore, similar to the one in PeriodicTask but without spawning a new thread, just a blocking wait on e.g. a
wait()
method. This gives you the flexibility to use the timer in your current thread if you are allowed to block there, instead of spawning a new thread with PeriodicTask. I think of an API like thisclass Timer { Timer(); // default c'tor Timer(units::Duration interval); // immediately starts the timer static cxx::expected<units::Duration, TimerError> now() noexcept; cxx::expected<TimerError> start(); // starts or restarts the timer with the current interval cxx::expected<TimerError> start(units::Duration interval); // starts or restarts the timer with the new interval cxx::expected<TimerError> stop(); // this is the only method that should be called from a different thread than the object was created cxx::expected<TimerEvent, TimerError> wait() noexcept; // blocking wait on the semaphore; with TimerEvent enum with the values `TICK` and `STOP` };
Let's recap a bit...
Lessons learned:
posix::timer
was waaaay to complex and contained different kind of constructs in one
restart()
is unnecessaryI like your proposal, it keeps things simple. However, I'm unsure on whether a plain wait()
will be user friendly enough. The idea of the legacy posix::Timer
was to make it user-friendly by storing the callable inside the posix::Timer
. #473 might come in handy here.
Around mid-2020 as a possible alternative to get rid of the too-many-thread-spawning-OS issue was discussed:
struct sigevent
has a member sigev_notify
and could be set to SIGEV_SIGNAL
SIGEV_THREAD
I fear we might need at kernel-space solution at some point and a user-space solution won't be feasible e.g. to fulfill certain timing constraints on QNX. IIRC QNX advised to use the timer_create
API. I also have the feeling that all CatchupPolicy
s should be supported by the new constructs.
@mossmaurice @MatthiasKillat what do you think?
Answering with a question: What would be the use-case for the PeriodicEvent
?
@mossmaurice
Thanks @shankar-in and @elBoberido I think we already made good progress.
Throughout this post I'm referring not to master but the current branch of #490
* spawning a thread and execute a task -> moved to PeriodicTask
IMHO that's a great building block. I would keep the semaphore out of there. Could be renamed to
PeriodicThread
as it is just syntactic sugar as you already noted. The name "task" is probably the most über-undefined term ever in automotive software development. And probably elsewhere, too. Avoiding it could be wise. Use-case: Do something ASAP periodically, no need for the timing to be precise e.g. MQ/UDS handling of keep-alive messages.
I used task
because it's also used in the STL https://en.cppreference.com/w/cpp/thread/packaged_task. I'm also not quite fond of PeriodicThread
since that's not what it does. It's more like a PeriodicExecutor
.
* periodically trigger an event -> currently also in PeriodicTask with the Semaphore
For me that's a separate entity. Let's call it
PeriodicEvent
.I'd like to have the last one in a new posix::Timer, but based on a Semaphore, similar to the one in PeriodicTask but without spawning a new thread, just a blocking wait on e.g. a
wait()
method. This gives you the flexibility to use the timer in your current thread if you are allowed to block there, instead of spawning a new thread with PeriodicTask. I think of an API like thisclass Timer { Timer(); // default c'tor Timer(units::Duration interval); // immediately starts the timer static cxx::expected<units::Duration, TimerError> now() noexcept; cxx::expected<TimerError> start(); // starts or restarts the timer with the current interval cxx::expected<TimerError> start(units::Duration interval); // starts or restarts the timer with the new interval cxx::expected<TimerError> stop(); // this is the only method that should be called from a different thread than the object was created cxx::expected<TimerEvent, TimerError> wait() noexcept; // blocking wait on the semaphore; with TimerEvent enum with the values `TICK` and `STOP` };
Let's recap a bit...
Lessons learned:
* `posix::timer` was waaaay to complex and contained different kind of constructs in one * Letting the OS start a thread was a bad idea, as in some occasions thousands of threads were spawned * `restart()` is unnecessary
I like your proposal, it keeps things simple. However, I'm unsure on whether a plain
wait()
will be user friendly enough. The idea of the legacyposix::Timer
was to make it user-friendly by storing the callable inside theposix::Timer
. #473 might come in handy here.
Well, it's just a building block. If you need the additional functionality use the PeriodicTask
, which uses this building block. Fun fact, the PeriodicTask
can store objects overloading the call operator ;)
Around mid-2020 as a possible alternative to get rid of the too-many-thread-spawning-OS issue was discussed:
* Stick with [the API](https://man7.org/linux/man-pages/man2/timer_create.2.html) * `struct sigevent` has a member `sigev_notify` and could be set to `SIGEV_SIGNAL` * Don't use `SIGEV_THREAD` * OS then creates the "Event" an signals the process * Benefit: Timing is controlled by OS
I fear we might need at kernel-space solution at some point and a user-space solution won't be feasible e.g. to fulfill certain timing constraints on QNX. IIRC QNX advised to use the
timer_create
API. I also have the feeling that allCatchupPolicy
s should be supported by the new constructs.
The timer
would currently use the semaphore, but that's just an implementation detail and can be changed in the future with POSIX timer_create
functionality. I'm also for the CatchupPolicy
and would pass it in the wait
method. This has the benefit to define it when it's necessary, while waiting for the next timer tick.
@mossmaurice @MatthiasKillat what do you think?
Answering with a question: What would be the use-case for the
PeriodicEvent
?
For me an event is something active, and is fired no matter what. The proposed timer is completely passive since it just blocks on wait
. Something like a PeriodicEvent
could use the timer as building block though. That would also be the use case for the refactored timer, a building block for PeriodicTask
. One could ask if that justifies a new class. I would say yes, since it nicely encapsulates the functionality and makes it available for cases where you already have a thread, need to block for a certain time and want to be able to interrupt the blocking call.
@elBoberido I am taking up refactoring of periodically trigger an event in this sprint. In case i have clarifications I will contact you in Gitter.
@elBoberido the abstimeout
calculation for the semaphore is present within the semaphore.cpp
. this makes the choice of clock unavailable for the the periodic timer. currently the semaphore always makes use of the system clock. For periodic timer I believe monotonic would be a right choice? In that case we have to move the calculation of the abstimeout
outside the semaphore ? pls feedback
@shankar-in The implementation is absolutely correct here! We do not want to do this outside since it is error prone especially when the developer does not read the documentation of the posix semaphore. Here the part from the sem_timedwait
manpage:
sem_timedwait() is the same as sem_wait(), except that abs_timeout specifies a limit on the amount of time that the call should block if the decrement cannot be immediately per‐ formed. The abs_timeout argument points to a structure that specifies an absolute timeout in seconds and nanoseconds since the Epoch, 1970-01-01 00:00:00 +0000 (UTC). This struc‐ ture is defined as follows:
struct timespec {
time_t tv_sec; /* Seconds */
long tv_nsec; /* Nanoseconds [0 .. 999999999] */
};
As you see we require here the unix time which is monotonic since it counts the seconds from 1970-01-01 (Timezone: UTC).
@elfenpiff Thanks.
CLOCK_REALTIME returns the Epoch based on the system time clock. So I think this can be modified which can lead to change in the time since epoch.
CLOCK_MONOTONIC doesnt depend on the system clock and cannot be set. So this value cannot be modified.
So isnt the Monotonic optimal for our periodic timer use case? pls feedback.
@shankar-in that depends. If you use the semaphore for the blocking wait then you have no choice and must use CLOCK_REALTIME
. We cannot always decide which clock to use, sometimes the API we use takes this decision for us. So every time you use a POSIX API which takes a timespec
or timeval
you have to check which clock the API expects like elfenpiff already mentioned with https://linux.die.net/man/3/sem_timedwait
@elBoberido Okay. I am making use of the api sem_timedwait
and let the API decide which would be the best clock for the purpose. :)
@elBoberido I have a clarification.
Pls find the impl of wait of the periodic / keep alive timer below,
cxx::expected<iox::cxx::TimerEvent, posix::SemaphoreError> PeriodicTimer::wait() noexcept
{
if (*(m_stop.getValue()) == static_cast<int>(posix::SemaphoreWaitState::TIMEOUT))
{
auto waitResult = m_stop.timedWait(m_interval, true);
if (waitResult.has_error())
{
return cxx::error<posix::SemaphoreError>(waitResult.get_error());
}
else
{
return cxx::success<iox::cxx::TimerEvent>(iox::cxx::TimerEvent::TICK);
}
}
else
{
return cxx::success<iox::cxx::TimerEvent>(iox::cxx::TimerEvent::STOP);
}
}
The clarification I have is. The periodic timer will be manipulated by two different threads 1. Periodic Task 2. call back function thread. The periodic timer start()
/ stop()
functions will be called by the periodic task. The callback function thread will call the wait()
on the periodic timer. Now Lets assume that Callback function thread is in wait()
and blocked for the duration. Meanwhile if the periodic task thread stops()
it releases the semaphore then the wait()
method acquires the semaphore immediately and keeps the timer Ticking.
So to avoid this concurrency issue should we introduce a mutex between the threads to update or read the semaphore state?
Pls feedback
@elBoberido Trying a work around. If the sem_ timedwait()
didnt complete the wait duration then we can conveniently assume that semaphore is obtained [sem_post()
happened in backgorund]. In that case we can release the semaphore and return back STOP. Hope this works.
@shankar-in it's hard to tell without all the code. Please create a draft PR for further discussion. Basically you have to store the stop state in an atomic and always return with STOP
if the flag is set.
Also, could you please name the class just Timer
? This makes it easier to extend it with methods like singleShot
which wouldn't be periodic but needs manual restarting.
@elBoberido Okay I will raise the draft PR
@shankar-in just for information, since we are targeting Thursday for our 1.0RC1 release. It might take some time until non release critical PRs are reviewed.
Brief feature description
Deprecate posix timer from iceoryx code base.
Detailed information
Posix timer has several issues (concurrency, truncation) and is pretty heavy weight and a bit slow. As iceorxy needs to be very responsive the posix timer should be removed and replaced with a lightweight stopwatch.
This will also close #190 as it'll be obsolete.
Tasks
DeadlineTimer
or similar and replace this part of thePosixTimer
where applicableposix::semaphore
which can be used forconcurrent::PeriodicTask
std::chrono
timestamp with aunits::Duration
from atimespec
initialized withclock_gettime
withCLOCK_REALTIME
-> maybe better to use a different issue for this