diku-dk / futhark

:boom::computer::boom: A data-parallel functional programming language
http://futhark-lang.org
ISC License
2.4k stars 167 forks source link

unsafe scatter #1076

Open FluxusMagna opened 4 years ago

FluxusMagna commented 4 years ago

Sometimes when using scatter the programmer knows that all indices are scattered to. Therefore no values from the initial array are present anymore. The compiler does not know this however, and the code will still generate whatever array is specified as initial value. This is somewhat wasteful, and performance could be improved by scattering to memory with unspecified values, as given by the scratch construct, present in the IR.

I think for most cases where the programmer knows that each index is included, there is exactly one instance of each index. A function implementing this could then have a type like

distribute [n] 't (is:[n]i32) (vs:[n]t) : [n]t

The danger with this is of course that if not all indices are written to, whatever was in memory previously will be left, making it nondeterministic. scatter is however already nondeterministic for duplicate indices, so I think this is a relatively small issue.

athas commented 4 years ago

I'm always reluctant to add more nondeterministic constructs. I guess this one could be OK, especially if we can show that the cost of replicateing an array is significant for some applications.

I prefer the name permute, because this operation really is a permutation of the values of the vs array.

FluxusMagna commented 4 years ago

I think the how much it affects performance will depend on hardware. FFTs typically need some form of permutation on each iteration, and since individual calculations are fairly cheap, writing to memory appears to be a somewhat limiting factor, at least on some hardware.

It's only nondeterministic for duplicate and out of bound indices, and scatter would also be in the former case (though in a different way), so a program that is deterministic using scatter should be deterministic using permute, unless the program has duplicate index-value pairs or out of bound indices.

athas commented 4 years ago

For FFTs specifically, this can also be addressed by manual reuse of arrays.

FluxusMagna commented 4 years ago

I hadn't considered that, it still seems like a useful function though.

FluxusMagna commented 4 years ago

Unless the compiler somehow eliminates it, manual reuse of arrays should result in one unnecessary copy per loop, since the output array is still written to in the beginning. An improvement over no reuse for sure, but permute would still improve this. Personally I also find iterate to be 'cleaner' than loop in most cases.

FluxusMagna commented 4 years ago

How about creating a safe permute?

 let permute 't  [n] (is:[n]i32) (vs:[n]t) : [n]t =
     let permutation = scatter (scratch n) is vs
     let full = scatter (replicate n false) is (replicate n true) |> reduce_comm (&&) true
     in assert full permutation

The performance would be worse than scatter for most cases, but it would be safer, since it's entirely deterministic, and with #[unsafe] the performance would improve.

athas commented 4 years ago

The replicate n false there is expensive enough to offset the advantage of the hypothetical scratch. That doesn't mean a safe permute is not useful, but there is no need to add scratch just to implement it. The scratch construct (or something similar) would only make sense for fundamentally unsafe operations, because the cost of safety checking is always going to equal the cost of initialisation.

FluxusMagna commented 4 years ago

I realize that it would be normally be slower than a normal scatter, but wouldn't it skip the assert, and thus the safety-check with #[unsafe]? That way, without #[unsafe], it would only permit unique indices, which would be useful for validation, and with #[unsafe] it would be faster than initializing an array and scattering to it. The #[unsafe] variant would be truly unsafe, and would benefit from a scratch construct. Perhaps I am misunderstanding how assert works with #[unsafe].

athas commented 4 years ago

Yeah, I guess with #[unsafe] dead code elimination would remove the inefficient replicates. That might not be a bad implementation.