Closed calebwin closed 5 years ago
But I think the one main question we need to answer is - once you have a function written in Emu,
You have a bunch of data stored in vectors and you want to take this function and somehow run it on the data. What would be the most sensible way that you should be able to do that?
emu! {
multiply(data [f32], coeff f32) {
data[...] *= coeff;
}
}
build! {
fn multiply(data: &mut Vec<f32>, coeff: &f32);
}
fn main() {
let mut data = vec![9.8, 3.8, 2.9, 4.6, 4.8];
multiply(&mut data, &2.0);
println!("{:?}", data);
}
emu! {
multiply_matrices(rows: [i32], cols: [i32], m i32, n i32, k i32, a [f32], b [f32], c [f32]) {
let row: i32 = row[...];
let col: i32 = cols[...];
let acc: f32 = 0.0;
for i in 0..k {
acc += a[i*m + row] * b[col*k + i];
}
c[col * m + row] = acc;
}
}
build {
fn multiply_matrices(rows: &mut Vec<i32>, cols: &mut Vec<i32>, m: &i32, n: &i32, k: &i32, a: &mut Vec<f32>, b: &mut Vec<f32>, c: &mut Vec<f32>);
}
fn main() {
let m: i32 = 3;
let n: i32 = 3;
let k: i32 = 3;
let mut a = vec![3.7, 4.5, 9.0, 3.7, 4.5, 9.0, 3.7, 4.5, 9.0];
let mut b = vec![3.7, 4.5, 9.0, 3.7, 4.5, 9.0, 3.7, 4.5, 9.0];
let mut c = vec![0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0];
multiply_matrices(&mut ![0, 1, 2], &mut vec![0, 1, 2], &m, &n, &k, &mut a, &mut b, &mutc).unwrap();
println!("{:?}", c);
}
This last function requires a vector of indices to be passed in. However, it should be possible for emu!
to make an optimization by removing idx
and replacing i
by get_global_id(0)
. Information about this optimization can then be passed on to the build!
macro which can make sure it doesn't transfer idx
to the GPU and LLVM should make the optimization of ignoring the vector of indices that isn't used anywhere in the Rust program.
emu! {
add(a [f32], b [f32], c [f32], idx [i32]) {
let i: i32 = idx[...];
c[i] = a[i] + b[i];
}
}
Hi! I was just looking at this project the other day and I think it looks very interesting! I just started wondering if something like "named holes", or holes in variables, would be an alternative to the index arrays. Basically extending the syntax to allow this:
emu! {
add(a: [f32], b: [f32], c: [f32]) {
let i: i32 = ...; // expands to i = get_global_id(0)
c[i] = a[i] + b[i];
}
}
or even something like this:
emu! {
add(a: [f32], b: [f32], c: [f32]) {
c[...i] = a[...i] + b[...i]; // implicitly defines i as get_global_id(0)
}
}
I'm not sure if the last one would be too magic, though. I think the benefit for keeping it as some kind of special syntax item is that you can make it compile to whatever is best, rather than having to follow the semantics of something else, like with an array.
These are good ideas.
I think holes are a neat idea but I want to make the holes system more structured and understandable for a user. Right now, the following...
function multiply(x [f32]) {
x[..] *= x[..];
}
...compiles to...
function multiply(x [f32]) {
x[get_global_id(0)] *= x[get_global_id(0)];
}
...which means that the program only has 1 hole. But it could also be compiled to...
function multiply(x [f32]) {
x[get_global_id(0)] *= x[get_global_id(1)];
}
...which means that the program has 2 holes. Which is correct? This is another issue that we need to address including the one you bring up with the potential solution being named holes or holes in variables. One thing about holes in variables, you need to ensure that wherever the variable is used, it must be used as an index of arrays of same length. In the example you give...
emu! {
add(a: [f32], b: [f32], c: [f32]) {
let i: i32 = ...; // expands to i = get_global_id(0)
c[i] = a[i] + b[i];
}
}
...Emu needs to infer the size of the 0th dimension of the grid of work-items and it could be the length of either a
, b
, or c
. And since its used for all of them, they must be of the same length.
Good points. I interpreted the holes to be independent from each other, but maybe I just didn't read closely enough. Either way, the current syntax doesn't make it clear if they are independent or not, as you say. My second example makes it more clear and has the hole attached to the arrays. Not saying it's the best syntax, or trying to push it in any way. It has drawbacks such as i
coming out of nowhere.
As for inferring the size, it could be defined to take the length of the shortest array, just like the .zip
iterator does, or it could even be a runtime error to not make sure the arrays have equal length. But I don't know how that would work if add
in the example is called from another emu function...
I guess another alternative would be to introduce some kind of length parameter in the array type. I'm borrowing the Rust syntax here, but what if this would be possible:
emu! {
add<const N: i32>(a: [f32; N], b: [f32; N], c: [f32; N]) {
let i: i32 = ...N; // Not sure about this syntax... Too close to ranges? maybe hole(N) instead?
c[i] = a[i] + b[i]; // Would be a compile error if i has the wrong length
}
}
It makes the compiler much more complex, I guess, and it's maybe not the direction you want to go. It would probably solve the enforcing/assuming equal length problem, but maybe not the whole inferring length problem. I don't know enough about how it works to say. I'm just a passerby who had an idea to share. :slightly_smiling_face:
I appreciate all of your input. I think I have an idea for how holes should work.
All holes should be independent. So if you want to square all elements in an array or you want to add two arrays, you use a function for enumerating all indices of an array.
function multiply(data [f32], coeff f32) {
data[..] *= coeff;
}
function square(data [f32]) {
let i: i32 = enumerate(data)[..];
data[i] *= data[i];
}
function add(a [f32], b [f32]) {
let i: i32 = enumerate(a)[..];
a[i] += b[i];
}
The thing I like about this is that for each of these functions there is clearly only 1 hole. And there really should be only 1 hole. Thoughts?
Looks like a good middle ground, as far as I can tell. I would have to dive more into this to say how it works in practice, but it seems better than the index arrays (fewer mistakes and fewer function parameters, etc.) while still being essentially the same.
OK. Emu v0.3.0 completely gets rid of "holes" or "enumerate" and replaces it all with simple for loops and pure Rust.
Closing!
A function in Emu operates on a "work-item" (work-item is a term OpenCL uses; I loosely use it here but we can refer to it differently if we come up with a better name).
With the above function, a work-item corresponds to a particular index in
global_buffer
. So the work can be thought of as a 1d grid with dimensions equal to the length ofglobal_buffer
. Let's consider another function.When this function is run, a work-item corresponds to a pair of indices - one in
global_a
and one inglobal_b
. So the work in this case is a 2d grid with dimensions equal to the product of the lengths ofglobal_a
andglobal_b
.Now here's the thing - both of these functions can be ultimately run with a binding to OpenCL. But only the first function can be run with the
build!
macro. This is because functions you intend to run with thebuild!
macro operate on 1d grids of work where the dimension is by default the length of the first parameter to the function.This is an important thing to note and I think it can help us answer the following key questions.
get_global_id()
?