Open Quuxplusone opened 5 years ago
Bugzilla Link | PR40763 |
Status | NEW |
Importance | P enhancement |
Reported by | Devin Hussey (husseydevin@gmail.com) |
Reported on | 2019-02-18 09:06:43 -0800 |
Last modified on | 2019-02-19 12:19:34 -0800 |
Version | 7.0 |
Hardware | Macintosh MacOS X |
CC | dblaikie@gmail.com, llvm-bugs@lists.llvm.org, mclow.lists@gmail.com |
Fixed by commit(s) | |
Attachments | |
Blocks | |
Blocked by | |
See also |
After profiling the code, I noticed that there are calls to snprintf_l via
std::num_put<char>::do_put, followed by fwrite.
gcc doesn't ever call any of the printf functions, it actually uses its own
function to turn integers to strings:
https://code.woboq.org/gcc/libstdc++-v3/include/bits/locale_facets.tcc.html#_ZSt13__int_to_charPT_T0_PKS_St13_Ios_Fmtflagsb
If I use strings to do the operation, for example, filling an array with
std::to_string() on each of the numbers beforehand and then using the array
items instead it yields decent performance.
This plus std::cout.write(str, &str[sizeof(str) - 1] - str) gives great
performance, under 200 microseconds and ~70-90 microseconds on gcc with
sync_with_stdio(false).
alignas(8)
static const char lut[] =
"00010203040506070809"
"10111213141516171819"
"20212223242526272829"
"30313233343536373839"
"40414243444546474849"
"50515253545556575859"
"60616263646566676869"
"70717273747576777879"
"80818283848586878889"
"90919293949596979899";
// Reinterpret lut as a short pointer.
static const short* lut16 = reinterpret_cast<const short*>(lut);
template <typename T>
char* int_to_string_base10(char *buf, T value) {
bool negative = false;
typedef typename std::make_signed<T>::type unsigned_type;
unsigned_type val;
// Remove negative and convert to unsigned.
if (std::is_signed<T>::value && value < 0) {
negative = true;
val = static_cast<unsigned_type>(-static_cast<typename std::make_signed<T>::type>(value));
} else {
val = static_cast<unsigned_type>(value);
}
if (val >= 100) {
// avoid unaligned access
if ((reinterpret_cast<std::uintptr_t>(buf) & 1) == 1) {
*--buf = (val % 10) + '0';
val /= 10;
}
// do this two bytes at a time, see Facebook's Three Optimization Tips for C++
short* buf16 = reinterpret_cast<short*>(buf);
while (val >= 100) {
*--buf16 = lut16[val % 100];
val /= 100;
}
buf = reinterpret_cast<char *>(buf16);
}
// Finish up the rest
do {
*--buf = (val % 10) + '0';
val /= 10;
} while (val != 0);
// Add the negative sign
if (std::is_signed<T>::value && negative) {
*--buf = '-';
}
return buf;
}
However, if adding '\n' is not possible, performance still isn't bad. Is the
compiler allowed to optimize cout << '\n' like it can do with printf->puts?