Open AndreyNevolin opened 5 years ago
Mentioning @vincentkfu for comment as he was the inventor/progenitor of the current fio timing work...
@AndreyNevolin it's interesting to know that (Dell) EMC use fio internally. I'm always curious as to which tools different people are using at their work - if you're allowed can you name any other performance tools used?
(I know Andrey already knows about this but I'm going to link to the original discussion over in https://github.com/axboe/fio/pull/387 for the sake of others who may land on this discussion)
@AndreyNevolin, thank you for these suggestions.
The proposed change to get_cycles_per_msec() definitely seems like an improvement.
The fio nanosecond patches actually included a change from measuring ticks-per-usec to measuring ticks-per-msec for improved accuracy. So it makes sense that using ticks-per-sec would be even better.
If no one beats me to it, I will investigate implementing your suggestions in fio.
Vincent, please do, seems useful to me.
FWIW, I also agree that the division by 1000000000 is troublesome, it's a big cycler waster. I did briefly yesterday look into doing this smarter. For the seconds, came up with this one:
seconds = (nsecs * 2199) >> 41;
but I'm still missing a good way to do get the remainder. Ideas welcome. The compiler is smart enough to fold the MOD and DIV into one thing, so we need to kill both to actually speed up this part.
Something like this would most likely be faster:
- tp->tv_sec = nsecs / 1000000000ULL;
- tp->tv_nsec = nsecs % 1000000000ULL;
+ secs = (nsecs * 0x897ULL) >> 41;
+ tp->tv_sec = secs;
+ tp->tv_nsec = nsecs - (secs * 1000000000ULL);
+ if (tp->tv_nsec >= 1000000000ULL) {
+ tp->tv_sec++;
+ tp->tv_nsec -= 1000000000ULL;
+ }
Looking at the generated code, there's actually no division in there. It's all shifts and multiplications, since it's divide by a constant. That said, I ran the above, and it does appear to be a little faster. Would be nice if others can test too.
@AndreyNevolin it's interesting to know that (Dell) EMC use fio internally. I'm always curious as to which tools different people are using at their work - if you're allowed can you name any other performance tools used?
@sitsofe I'm not with DellEMC anymore. Sorry for the confusion. I need to fix my profile Currently I work for a small company that has its own server platform and also produces storage systems based on that platform. fio is indeed used widely in this company for production testing. But almost all big companies use home-grown tools for the production testing. There are quite a bit reasons for that. One of the biggest reasons - reproducibility. A lot of effort is put into that. Also every big company has its own performance evaluation methodology. The tools are closely tied to that methodology.
But R&D departments in big companies DO use open source tools widely and almost don't use home-grown tools. fio is a popular choice for block workloads. IOR and Mdtest are popular choices for distributed file systems. Mongoose (https://github.com/emc-mongoose/mongoose) is a good choice for object stores and conventional NAS stores
@AndreyNevolin Thanks for letting me know! I have hopes to one day write a tool survey so I'll stash your comments away (for example mongoose is new to me).
Hi,
I borrowed some ideas - but not the code though - from fio for my own benchmarking tools. Namely, I was interested with measuring wall-clock time using time-stamp counters (or time-base registers, or whatever they're called on different architectures).
I'd like to give back some observations that - I believe - would allow to improve the precision of time measurements done by fio.
1) I believe, currently fio measures time intervals of several seconds with the precision of several hundred nanoseconds (which is pretty the same, or in many cases much worth than
clock_gettime()
; though, the overhead is indeed much lower in fio) 2) the loss of precision in fio stems from the following facts:more significant issue than the first one. This is a snippet from
get_cycles_per_msec()
:MAX_CLOCK_SEC
. Both values must be on the same scale, otherwise they will "drag" overall precision towards different sides. For example, if precision given by ticks-per-sec looks good, thanMAX_CLOCK_SEC
must also be at the scale of seconds (now it's 1 hour in fio)Taking all of the above into account, it's possible to achieve precision of dozens of nanoseconds (at the scale of second-long time periods). This is still not a pure nanosecond precision but is much better than multiple hundreds of nanoseconds per second.
One additional observation: on-the-fly conversion of ticks to nanoseconds in fio was designed to avoid integer divison (which is indeed really costly). But there is still one division operation used to store calculated nanoseconds in
timespec
format:Not sure whether anything can be done about that.
I'm sorry for not proposing the patch. Unfortunately, I'm not a fio developer. Not even a fio user. Also I did all precision evaluations using my own code. For the purpose of fio development, evaluations based on fio itself would be more trusted.
Thank you for the great tool! Though I don't use it myself, performance engineers in my company do use it really intensively.
Hope, at least some of my observations will be useful.