Closed ninegua closed 8 years ago
BTW, this is what is causing nengo to produce wrong output.
A[C]
is semantically parallel I think. It could be translated to :mmap
and reduce
since it implies concatenation of the selected elements into a new array.
Later in ParallelIR
sum
can be fused so the reduction is optimized. What do you think?
How is bitarray selection parallel when A=[1,2,3,4,5]
and A[A .< 3]==[1,2]
?
I see what you mean. But unless we support dynamically growing j2c_arrays, mapping it to parallel operation is going to be rather inefficient.
The backend implementation is a different issue. We can make it sequential for now if the operation wasn't removed in optimizations after going through the ParallelIR
pipeline. Keeping DomainIR
and ParallelIR
general is useful I think.
A deeper issue is that translating A[C]
to mmap
and reduce
is not composable
. What I mean is by this definition, it is hard to compose the translation of sum
, and the translation of A[C]
. Suppose we follow this translation like you proposed:
sum(x) ===> reduce(0, (a,b)->a + b, x)
A[C] ===> reduce([], (a, b) -> cat(a, b), mmap((a, c) -> c ? [a] : [], A, C))
Then as an outcome we have:
sum(A[C]) ===> reduce(z0, f0, reduce(z1, f1, mmap(f2, A, C)))
where
z0 = 0
f0(a,b)=a+b
z1=[]
f1(a,b)=cat(a,b)
f2(a,c)=c? [a] : []
I don't know a general rule to transform reduce-reduce
for arbitrary z0, z1, f0, f1, f2
at DomainIR level. If we just go ahead and lower each reduce
to Parallel IR, I doubt we'll be able to fuse these two loops since their loop counts are different.
This is why I'd prefer to keep A[C]
as in select(A, C)
in DomainIR, and lower reduce(z, f, select(A, C))
as a special case of lowering reduce
in ParallelIR.
You are right. A separate select
seems to be better.
This is fixed in c408907152dfdd122e8f0fc3e75929c96afce7de. Test cases are added in e3796ada19a5b06e2f920b7b212b6a9e13441a96.
The following program shows what is wrong:
Basically something like
A[C]
whereC
is a bitarray is being translated tommap
in DomainIR, but this operation by itself _is not parallelizable_. So the subsequent fusion with the reduction in ParallelIR turned out to be wrong too.This is only parallel when combined with a reduction such as
sum(A[C])
, or with an assignment such asA[C] = B[C]
. We used to handle this just right, but the recent changes to support both bitarrays and ranges selection are too aggressive.I'd suggest we leave
:select
as a DomainIR operation forA[C]
instead of using:mmap
, and treat:select
properly in ParallelIR, including a sequential translation forA[C]
when it cannot be parallelized.