My two cents: given full configurability (looks great!) i would suggest not adding “ IMPLOT_INSTANTIATE_ALL_NUMERIC_TYPES. For those rare types it seems reasonable to simply let the user express their needs explicitly.”
I do agree, and this PR removes them.
especially as it may be tempting for users and we don’t know what long double would carry in term of math runtimes.
That is an interesting question! I made a quick study, and my findings are:
the difference of speed between float and double can be as high as 20% (and the result do depend on the arch and compiler)
when double and long double have the same size (8 bytes), long double is 20% to 30% slower (except on apple M1 processor)
when sizeof(long double) = 16 vs sizeof(double) = 8, long double can be 10 to 30 times slower (this depends on the compiler and platform)
I haven’t poked in internals but wondered if small types <32 or <64 could somehow attempt to reuse larger types functions, with special adapters? Intuitively it seems like u8 u16 could use the code for u64. This is implying that generated code size and link size could be reduced. With custom type list this is now perhaps a little harder to setup, so perhaps it would need an hardcoded proof of concept because getting it to work with custom type lists.
There is perhaps a solution, but I suspect it would be very likely be "platform / compiler / arch / type" specific and maybe difficult to maintain.
In the case of unsigned integers, a possible solution would be to:
implement only the full code for the u64 version
have a template function for u8, u16, u32 that would call the u64 version by first reading elements as u64.
have a way to read u64 values inside a u8, u16 or u32 array. A possible solution would be to set u64 value = 0;, then fill the right portion of value's bytes with the bytes from the array
However, based on my observations, the memory layout for a 64 bits int (on a Mac M1) is a mix of:
little-endian (for bits)
big-endian (between bytes)
big-endian (between groups of 4 bytes)
And I suspect that this is very much platform dependent.
Example, if I set
int64_t v64 = 0x123456789ABCDE00;
and examine the memory at &v64, it looks like this on my Mac:
I do agree, and this PR removes them.
That is an interesting question! I made a quick study, and my findings are:
float
anddouble
can be as high as 20% (and the result do depend on the arch and compiler)double
andlong double
have the same size (8 bytes),long double
is 20% to 30% slower (except on apple M1 processor)sizeof(long double) = 16
vssizeof(double) = 8
, long double can be 10 to 30 times slower (this depends on the compiler and platform)There is perhaps a solution, but I suspect it would be very likely be "platform / compiler / arch / type" specific and maybe difficult to maintain.
In the case of unsigned integers, a possible solution would be to:
u64 value = 0;
, then fill the right portion ofvalue
's bytes with the bytes from the arrayHowever, based on my observations, the memory layout for a 64 bits int (on a Mac M1) is a mix of:
Example, if I set
int64_t v64 = 0x123456789ABCDE00;
and examine the memory at &v64, it looks like this on my Mac: