Open omac777 opened 4 years ago
The main thing stopping string handling is supporting bytes/ints/chars, right? This is blocked right now until we figure out better type inference.
I'm not sure about regex'ing but slicing will probably never be supported in the Rust subset because it is practically impossible on GPUs. GPUs have different sections of memory - "global", "local", and "private." "global" is the most expensive to allocate and place data into and out of. "private" is the cheapest but is only registers. Registers are like slots but can only hold primitive types like int
or float
. So a slice would have to be placed in "global" memory which would be really, really inefficient.
And I don't think slices are really necessary. Once you have a slice you want to do one of 2 things.
The first is already possible (if you index directly into the data you are trying to take a slice of) and once we support for loops inside of the "kernel"/for loop body, the second will be possible too.
Sorting could be implemented by hand as parallel bubble sort once we have support for if statements, modulo operator, variables (to add support we need to work on modifying this traversing code and ensure that the type-safety is not messed up. Sorting like this is could also maybe be implemented at some point.
let mut x = vec![0.0; 1000];
// ...
// ...store random numbers in x...
// ...
gpu_do!(load(x));
gpu_do!(launch());
x.sort();
You can read the linked comment above for figuring out type inference. But basically the challenge is that for OpenACC that does what Emu does but for C/C++, they have stuff like this.
int z = x + y;
And they know the type is int
so they can produce the OpenCL, int z = x + y
and maintain type safety.
But we have Rust code like this.
let z = x + y;
And somehow, we need to figure out that this z
is an int
.
Wait, actually, sorting shouldn't be built in. It should be defined in some separate crate GPU-accelerated sorting.
let mut x = vec![0.0; 1000];
// ...
// ...store random numbers in x...
// ...
gpu_do!(load(x));
x = sorting::sort(x);
Regex'ing and slicing also won't be built in. All of these should be implemented manually. However, for these to be implement-able, the above things do still need to be supported. (variables, if/else, type inference, etc.)
I did read a bit more into the "CUDA C PROGRAMMING GUIDE PG-02829-001_v10.1 | August 2019".
In theory, the emu vectors could contain any of these types:
char1, uchar1 1
char2, uchar2 2
char3, uchar3 1
char4, uchar4 4
short1, ushort1 2
short2, ushort2 4
short3, ushort3 2
short4, ushort4 8
int1, uint1 4
int2, uint2 8
int3, uint3 4
int4, uint4 16
long1, ulong1 4 if sizeof(long) is equal to sizeof(int) 8, otherwise
long2, ulong2 8 if sizeof(long) is equal to sizeof(int), 16, otherwise
long3, ulong3 4 if sizeof(long) is equal to sizeof(int), 8, otherwise
long4, ulong4 16
longlong1, ulonglong1 8
longlong2, ulonglong2 16
longlong3, ulonglong3 8
longlong4, ulonglong4 16
float1 4
float2 8
float3 4
float4 16
double1 8
double2 16
double3 8
double4 6
The "if" conditional is supported within cuda kernels. It's also supported within OpenACC. https://www.openacc.org/sites/default/files/inline-files/API%20Guide%202.7.pdf
Although outside of the scope of your emu, it could be interesting to see support for GPUDirect RDMA within emu also: https://www.sc-asia.org/2018/wp-content/uploads/2018/03/1_1500_Ido_Shamay.pdf https://www.mellanox.com/related-docs/prod_software/RDMA_Aware_Programming_user_manual.pdf
Wait, actually, sorting shouldn't be built in. It should be defined in some separate crate GPU-accelerated sorting.
let mut x = vec![0.0; 1000]; // ... // ...store random numbers in x... // ... gpu_do!(load(x)); x = sorting::sort(x);
Regex'ing and slicing also won't be built in. All of these should be implemented manually. However, for these to be implement-able, the above things do still need to be supported. (variables, if/else, type inference, etc.)
Actually my intent was not to mutate the input request vector itself. I would be passing along a second response vector itself which would contain a different structure of vector, but with similar type something like 8-bit unsigned integer "u8" also known as a byte which is what you would find within your typical memory location or file. If all goes well the actual response reference passed in is a direct mapping to an intended response file which could be local or remote.
In theory, the emu vectors could contain any of these types:
char1, uchar1 1 char2, uchar2 2 char3, uchar3 1 char4, uchar4 4 short1, ushort1 2 short2, ushort2 4 short3, ushort3 2 short4, ushort4 8 int1, uint1 4 int2, uint2 8 int3, uint3 4 int4, uint4 16 lovng1, ulong1 4 if sizeof(long) is equal to sizeof(int) 8, otherwise long2, ulong2 8 if sizeof(long) is equal to sizeof(int), 16, otherwise long3, ulong3 4 if sizeof(long) is equal to sizeof(int), 8, otherwise long4, ulong4 16 longlong1, ulonglong1 8 longlong2, ulonglong2 16 longlong3, ulonglong3 8 longlong4, ulonglong4 16 float1 4 float2 8 float3 4 float4 16 double1 8 double2 16 double3 8 double4 6
Yes. While f32
is what GPUs are optimized for, other primitive types can have support added for them easily. The reason why I haven't just gone ahead and added them is because I'm trying to think carefully about types, type safety.
The "if" conditional is supported within cuda kernels. It's also supported within OpenACC. https://www.openacc.org/sites/default/files/inline-files/API%20Guide%202.7.pdf
I also haven't added if statements because that would require adding bool
to the type system. And I'm not entirely convinced that just adding these types can be done without breaking type safety guarantee. I'm certain there is a way to do it, I just don't know if the "easy way" is the right way or if there is a harder way that will guarantee type safety with even more certainty.
Actually my intent was not to mutate the input request vector itself. I would be passing along a second response vector itself which would contain a different structure of vector, but with similar type something like 8-bit unsigned integer "u8" also known as a byte which is what you would find within your typical memory location or file. If all goes well the actual response reference passed in is a direct mapping to an intended response file which could be local or remote.
You can create a separate vector and mutate that instead. Emu lets you do that. The only big complication is adding the u8
type. Again, it's probably safe to add, but I'm not yet convinced you can do it easily.
Is there any way to do value clamping without supporting if or bool?
Are there any plans to provide string or byte array handling capability from within emu kernels? I believe it would be feasible if there was more support for integer types within emu. I understand both cuda/opencl provide integer support within kernels.
Thank you for listening.