Closed cleeus closed 3 years ago
That's an interesting idea, but the implementation of _safe_size()
seems not optimal.
As _safe_size()
will probably be called a lot of times, attention must be paid to the extra cost in machine time.
Maybe the test:
(size_t)size_x*size_y*size_z*size_c==(cimg_uint64)size_x*size_y*size_z*size_c;
should be enough in _safe_size()
?
I started with that simple check, but I don't think it is enough, because multiplying two 32bit values results in a 64bit value. So multiplying more then two 32bit values could also overflow the 64bit type. So for each 32bit multiplication you need a check. And then there often follows a multiplication with *sizeof(T) so this has to be checked, too, I think.
Since this check is often followed by a malloc, I don't think performance is an issue here. The time it takes to do these few multiplications is dwarfed by the subsequent malloc (malloc is orders of magnitudes slower).
OK. it sounds good.
Maybe I'll try to make _safe_size()
a bit faster afterwards (e.g. the first multiplication don't have to be tested, if we start with size=size_x
).
Thanks for the contribution.
I experimented a bit with different forms to express this and looked at the codege/asm output of MSVC2019/x64. It seems to store the arguments on the stack in the dim[] array. If you wan't to avoid this stack write, you can unroll the loop. The best codegen I could get was with this form:
int overflows = 0;
size_t size = size_x;
overflows += (size_y > 1 && size * size_y <= size) ? 1 : 0;
size *= size_y;
overflows += (size_z > 1 && size * size_z <= size) ? 1 : 0;
size *= size_z;
overflows += (size_c > 1 && size * size_c <= size) ? 1 : 0;
size *= size_c;
But as said, the perf impact compared to the malloc should be minuscule. Feel free to use above snippet. Thanks for merging the PR.
I'm actually experimenting with an unrolled version..
I'm also moving CImg<T>::_safe_size()
to cimg::size()
, because cimg::
is the namespace used to store such convenience functions.
I've ended up with:
// Return 'dx*dy*dz*dc' as a 'size_t' and check no overflow occurs.
static size_t safe_size(const unsigned int dx, const unsigned int dy,
const unsigned int dz, const unsigned int dc) {
if (!(dx && dy && dz && dc)) return 0;
size_t siz = (size_t)dx, osiz = siz;
if ((dy==1 || (siz*=dy)>osiz) &&
((osiz = siz), dz==1 || (siz*=dz)>osiz) &&
((osiz = siz), dc==1 || (siz*=dc)>osiz) &&
((osiz = siz), sizeof(T)==1 || (siz*sizeof(T))>osiz)) return siz;
throw CImgArgumentException("CImg<%s>::safe_size(): Specified size (%u,%u,%u,%u) overflows 'size_t'.",
pixel_type(),dx,dy,dz,dc);
return 0;
}
and finally put it as a static
function in CImg<T>
. Does it look good to you ?
To be honest: I have a hard time parsing this C syntax and understanding it (sequence points, comma-operator, assignment and in-place operators inside if clause) and it generates nearly the same assembly as the simple unrolled form above. I suggest to lean on the side of readability. It seems to be correct though.
Here is the ASM output from your if(...) version:
00007FF646F38A57 test edx,edx
00007FF646F38A59 je cimg_library::CImg<char>::_safe_size+65h (07FF646F38AB5h)
00007FF646F38A5B test r8d,r8d
00007FF646F38A5E je cimg_library::CImg<char>::_safe_size+65h (07FF646F38AB5h)
00007FF646F38A60 test r9d,r9d
00007FF646F38A63 je cimg_library::CImg<char>::_safe_size+65h (07FF646F38AB5h)
00007FF646F38A65 mov r10d,dword ptr [size_c]
00007FF646F38A6D test r10d,r10d
00007FF646F38A70 je cimg_library::CImg<char>::_safe_size+65h (07FF646F38AB5h)
00007FF646F38A72 mov ecx,edx
00007FF646F38A74 mov r8d,edx
00007FF646F38A77 cmp r11d,1
00007FF646F38A7B je cimg_library::CImg<char>::_safe_size+36h (07FF646F38A86h)
00007FF646F38A7D imul rcx,r11
00007FF646F38A81 cmp rcx,r8
00007FF646F38A84 jbe cimg_library::CImg<char>::_safe_size+6Ch (07FF646F38ABCh)
00007FF646F38A86 mov r8,rcx
00007FF646F38A89 cmp r9d,1
00007FF646F38A8D je cimg_library::CImg<char>::_safe_size+4Bh (07FF646F38A9Bh)
00007FF646F38A8F mov eax,r9d
00007FF646F38A92 imul rcx,rax
00007FF646F38A96 cmp rcx,r8
00007FF646F38A99 jbe cimg_library::CImg<char>::_safe_size+6Ch (07FF646F38ABCh)
00007FF646F38A9B mov r8,rcx
00007FF646F38A9E cmp r10d,1
00007FF646F38AA2 je cimg_library::CImg<char>::_safe_size+5Dh (07FF646F38AADh)
00007FF646F38AA4 imul rcx,r10
00007FF646F38AA8 cmp rcx,r8
00007FF646F38AAB jbe cimg_library::CImg<char>::_safe_size+6Ch (07FF646F38ABCh)
00007FF646F38AAD mov rax,rcx
Here is the ASM output from the overflows+= sequence version from above:
00007FF678F08A9E mov r10d,1
00007FF678F08AA4 mov r11d,edx
00007FF678F08AA7 mov rbx,rcx
00007FF678F08AAA mov eax,r8d
00007FF678F08AAD imul rax,r11
00007FF678F08AB1 mov esi,r8d
00007FF678F08AB4 mov ebp,edx
00007FF678F08AB6 cmp r8d,r10d
00007FF678F08AB9 jbe cimg_library::CImg<char>::_safe_size+45h (07FF678F08AC5h)
00007FF678F08ABB cmp rax,r11
00007FF678F08ABE ja cimg_library::CImg<char>::_safe_size+45h (07FF678F08AC5h)
00007FF678F08AC0 mov r8d,r10d
00007FF678F08AC3 jmp cimg_library::CImg<char>::_safe_size+48h (07FF678F08AC8h)
00007FF678F08AC5 xor r8d,r8d
00007FF678F08AC8 mov edx,r9d
00007FF678F08ACB imul rdx,rax
00007FF678F08ACF cmp r9d,r10d
00007FF678F08AD2 jbe cimg_library::CImg<char>::_safe_size+5Eh (07FF678F08ADEh)
00007FF678F08AD4 cmp rdx,rax
00007FF678F08AD7 ja cimg_library::CImg<char>::_safe_size+5Eh (07FF678F08ADEh)
00007FF678F08AD9 mov eax,r10d
00007FF678F08ADC jmp cimg_library::CImg<char>::_safe_size+60h (07FF678F08AE0h)
00007FF678F08ADE xor eax,eax
00007FF678F08AE0 mov rcx,rdi
00007FF678F08AE3 add r8d,eax
00007FF678F08AE6 imul rcx,rdx
00007FF678F08AEA cmp edi,r10d
00007FF678F08AED jbe cimg_library::CImg<char>::_safe_size+74h (07FF678F08AF4h)
00007FF678F08AEF cmp rcx,rdx
00007FF678F08AF2 jbe cimg_library::CImg<char>::_safe_size+77h (07FF678F08AF7h)
As you can see, there is no big difference. Both generate a similar amount and type of instructions.
@dtschump this is now tracked by RedHat as CVE-2020-25693
The size calculation pattern (size_t)size_xsize_ysize_z*size_c can overflow the resulting size_t. Especially on 32bit size_t platforms this is trivial and can be achieved using a simple PNM image, e.g. the following ASCII PNM would allocate only 6 byte and result in a trivial arbitrary heap write: P3 2147483649 2 255 255 255 255 255 255 255 255 255 255 255 255 255 255 255