Open timfelle opened 1 month ago
I think your masks in points zones without the 0 index is the way to go! Victor seems to have brought this up on Zulip on the 7th of May this year and I think there was more or less consensus on scrapping the size of the mask in the 0 index would be the way to go.
What I remember I was more sceptical towoard using c indexing for the mask_d.
What do you suggest for the device masks then?
Thinking about it I thought something like this I guess. No mask has the size in the beginning.
Optionally
I am mainly opposed to using the subscore _d for a variable x when it is not the same data in x and x_d. I dont think of it in terms of functionality but as values stored kinda.
How would you like the masks to look optimally?
Well I focus on the usage of the masks, so when i run through any masked loop i expect the mask to reflact that it is a mask.
for i in mask
a(i) = whatever,
regardless of language. And we have so far used the _d
suffix to indicate, "the same but on device", such that a user will just need to append the _d
to their variables and then the functionality of what they are doing is the same.
I think it will cause more confusion if masks are to be used differently than field_t
, vector_t
, matrix_t
, dofmap_t
, space_t
and essentially all other types in Neko.
but c_mask would fix this or?
I do not think of it as you I guess. I think more like
for i in size(mask):
mask(i) = mask_d(i)
I do not think it mimics functionality as much as it mirrors the data to the device. Is this different from the other types?
Yes it is different from the other data points as v%x(1)
is not the same as v%x_d(1)
since the array on cpu is stored as a 1-index and the device 0-index on the device. Meaning for all the other types v%x(1) == v%x_d(0)
.
Got me there ;)
By data I guess I really mean I expect the data to be the same, regardless if the programming language I use is 0 or 1 indexed, _d simply denotes whether it is on device or not.
So it is really a question of functionality or data I think.
Well it is all a question of naming. The direct copy of mask
would not be useful to have. So it is a question of wether to call the device mask c_
or _d
.
I would be all for c_mask_d and scrapping mask_d
Why do you want both?
because _d
denotes it is on device and c_
denotes it is c (0) indexed.
But the _d
would be redundant as c_mask
would not exist, and the _d
in your definition have been used to indicate a device copy of the host variable.
I am as strong proponent of not prepending the size in the 0 index and generally using non-standard indexing in Fortran, because things become a mess once you need to pass that array as a dummy argument, or access it via a pointer in another type. By default, it will be reindexed to start from 1 again, so one has to really take care to either avoid that or take it into account.
But the
_d
would be redundant asc_mask
would not exist, and the_d
in your definition have been used to indicate a device copy of the host variable.
I guess I in principle would want both a c_mask
and a mask_d
as well. I could see scenarios where one would use a programming model with 1 indexing on the gpu or 0 indexing on the cpu in the future. It did not include them now mostly because for the current state of the code it would just use a bit of memory.
But one can't truly have a 0 indexed array in fortran, right? It is essentially always a local "view", and once you pass the array elsewhere it is 1-based. Or am I wrong? If I am not, I would say one should create a local 0-based view if one really wants to but the default layout in the definition inside a type should be 1 based.
But one can't truly have a 0 indexed array in fortran, right? It is essentially always a local "view", and once you pass the array elsewhere it is 1-based. Or am I wrong? If I am not, I would say one should create a local 0-based view if one really wants to but the default layout in the definition inside a type should be 1 based.
His point is that we might write C, C++ or some other code, not running on a device but still call it from Neko. And that CUDA Fortran does exists, which is a 1-index device language.
I see. I guess having all sorts of masks can be useful, yeah.
It seems a concencus is made:
Masks should be defined as follows, in order to explicitly state their location and purpose.
mask_0
, host array representing a 0-indexed mask.mask_1
, host array representing a 1-indexed mask.mask_0_d
, device array representing a 0-indexed mask.mask_1_d
, device array representing a 1-indexed mask.
There are inconsistencies in how masks are dealt with in different parts of the code. We should agree on a single approach and align to that.
Some parts of the code uses an array of integers with length
m
to define a mask (an example is thepoint_zone_t
. Other parts of the code uses an array of integers with the size prepended, so it have lengthm+1
.Some masked math operations assume one, others assume the other.
@MartinKarp I know you were a strong proponent of prepending the size to the arrays, can you describe why that was useful?