Closed QingleiCao closed 4 months ago
I still think that the right approach is to make
bsiz
size_t
.Also, the build failure is legit (stemming from the added assert):
/tmp/parsec/parsec/parsec/data_dist/matrix/matrix.c: In function 'parsec_tiled_matrix_init': /tmp/parsec/parsec/parsec/data_dist/matrix/matrix.c:98:31: error: 'INT_MAX' undeclared (first use in this function) 98 | assert((size_t)mb * nb <= INT_MAX); | ^~~~~~~ /tmp/parsec/parsec/parsec/data_dist/matrix/matrix.c:19:1: note: 'INT_MAX' is defined in header '<limits.h>'; did you forget to '#include <limits.h>'? 18 | #include "parsec/data_dist/matrix/two_dim_tabular.h" +++ |+#include <limits.h>
I'm not sure whether there is potential side-effect as @abouteiller pointed out.
INT_MAX is not a size_t so even with the correct header the compiler will generate a warning and the comparison will be done in the lesser precisions. You really intend to create data that is over 2GB ?
I'm not sure whether there is potential side-effect as @abouteiller pointed out.
Unless we pass a pointer to bsiz
somewhere that expects an int*
there should be no side-effect. Compiling with -Wall
should help finding any place where that happens, or simply a grep
.
INT_MAX is not a size_t so even with the correct header the compiler will generate a warning and the comparison will be done in the lesser precisions.
The int
of INT_MAX
will be promoted to size_t
as per the C integer promotion rules (64bit integer has higher rank than 32bit integer). That assert is legit, albeit not very helpful unless applications run with a debug build of PaRSEC.
A quick grep showed no place where we pass a pointer to bsiz
. There are places where we cast bsiz
to size_t
already:
parsec/data_dist/matrix/two_dim_rectangle_cyclic.c: pos *= (size_t)dc->super.bsiz;
parsec/data_dist/matrix/two_dim_rectangle_cyclic.c: pos *= (size_t)dc->super.bsiz;
and in matrix.c
we store the result of fread
/fwrite
in an int
and compare against bsiz
. Those functions return size_t
though, so that could be adjusted as well.
Rule 6.3.1.1 applies here:
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
The conversion will always work here because we are dealing with a constant with a positive value.
You really intend to create data that is over 2GB ?
Yeah, each timeslot includes L^2/2 ZGEMV. the size of each vector is L. Those L^2/2 number of vectors are stored in a file, so I'm loading them together. L can be 720 or 1440. If L = 720, then 720^2/2 720 sizeof(complex double) = 2,985,984,000
A quick grep showed no place where we pass a pointer to
bsiz
. There are places where we castbsiz
tosize_t
already:parsec/data_dist/matrix/two_dim_rectangle_cyclic.c: pos *= (size_t)dc->super.bsiz; parsec/data_dist/matrix/two_dim_rectangle_cyclic.c: pos *= (size_t)dc->super.bsiz;
and in
matrix.c
we store the result offread
/fwrite
in anint
and compare againstbsiz
. Those functions returnsize_t
though, so that could be adjusted as well.A quick grep showed no place where we pass a pointer to
bsiz
. There are places where we castbsiz
tosize_t
already:parsec/data_dist/matrix/two_dim_rectangle_cyclic.c: pos *= (size_t)dc->super.bsiz; parsec/data_dist/matrix/two_dim_rectangle_cyclic.c: pos *= (size_t)dc->super.bsiz;
and in
matrix.c
we store the result offread
/fwrite
in anint
and compare againstbsiz
. Those functions returnsize_t
though, so that could be adjusted as well.
Yes, others directly related to bsiz
are already protected by size_t
.
Any suggestions? Changing bsiz
to size_t
or this is fine as mb * nb > INT_MAX is rare?
Rule 6.3.1.1 applies here:
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
The conversion will always work here because we are dealing with a constant with a positive value.
Let's run through this text: INT_MAX
is a signed integer. size_t
is an unsigned integer with rank higher than int
(32 vs 64b on 64bit archs).
if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand
So yes, this rule applies: size_t
is the unsigned interger type that has rank greater than int
, which is the signed integer type with lower rank.
then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
This requires that INT_MAX
(int) is promoted to the type of the operand with the unsigned type (size_t
).
Now, if we were to compare size_t
against INT_MIN
we would be in trouble because promoting a negative number to an unsigned type of higher rank would create a larger number, because 2's complement. But that's not happening here and it would be a dubious thing to do in the first place.
Any suggestions? Changing bsiz to size_t or this is fine as mb * nb > INT_MAX is rare?
Things that encode counts should be size_t
to avoid future overflows. The check we're talking about then becomes irrelevant and we can all go home happy :) there is of course the issue of sending such a tile via MPI, which can be solved via the big count API that hopefully we an use soon. My understand though is that in @QingleiCao's case the data remains local.
@devreal I'm confused by your comment, because it started as a stark contradiction to my statement in the end it basically states exactly the same thing. 😕
Anyway, this PR looks good as it is.
@bosilca You said:
INT_MAX is not a size_t so even with the correct header the compiler will generate a warning and the comparison will be done in the lesser precisions.
I don't think that is what the standard says and I hope I did not convey that in my comment :)
If the right-hand side was of a different signedness and not a constant, where the compiler can verify that the conversion is safe, then the compiler should have warned that the conversion is unsafe.
Rule 6.3.1.1 applies here:
Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
The conversion will always work here because we are dealing with a constant with a positive value.
Let's run through this text:
INT_MAX
is a signed integer.size_t
is an unsigned integer with rank higher thanint
(32 vs 64b on 64bit archs).if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand
So yes, this rule applies:
size_t
is the unsigned interger type that has rank greater thanint
, which is the signed integer type with lower rank.then the operand with signed integer type is converted to the type of the operand with unsigned integer type.
This requires that
INT_MAX
(int) is promoted to the type of the operand with the unsigned type (size_t
).Now, if we were to compare
size_t
againstINT_MIN
we would be in trouble because promoting a negative number to an unsigned type of higher rank would create a larger number, because 2's complement. But that's not happening here and it would be a dubious thing to do in the first place.Any suggestions? Changing bsiz to size_t or this is fine as mb * nb > INT_MAX is rare?
Things that encode counts should be
size_t
to avoid future overflows. The check we're talking about then becomes irrelevant and we can all go home happy :) there is of course the issue of sending such a tile via MPI, which can be solved via the big count API that hopefully we an use soon. My understand though is that in @QingleiCao's case the data remains local.
Yeah, in my case, it's local without communication involved.
Well, INT_MAX
instead of INT_MIN
, right? @devreal
there is of course the issue of sending such a tile via MPI, which can be solved via the big count API that hopefully we an use soon.
Indeed, that was my first reaction reading all this. The drawback is that we have to use the large count variants for everything, not just the types.
You really intend to create data that is over 2GB ?
Yeah, each timeslot includes L^2/2 ZGEMV. the size of each vector is L. Those L^2/2 number of vectors are stored in a file, so I'm loading them together. L can be 720 or 1440. If L = 720, then 720^2/2 720 sizeof(complex double) = 2,985,984,000
I'm not sure how the patch fixes your issue (at best it delays it to when you want to load a matrix that is 8x larger?), bsiz is still in int and will still be truncated for some (apparently, since less than 1 order of magnitude to the one you use now) reasonable sizes. What is our reasoning for not promoting bsiz to a size_t overall?
I will change bsiz
to size_t. I think we also need to change the matrix size as well. I spent quite a while debugging an issue because the matrix (1*N) is larger that INT_MAX (N > INT_MAX).
The matrix size is already cast in the high-level distribution API, e.g., parsec_matrix_sym_block_cyclic_init, so this PR is only about bsiz
.
645