Ginko-X / Streaming_NESL

1 stars 0 forks source link

revised Xducer API #24

Closed afilinski closed 6 years ago

afilinski commented 7 years ago

Here's a suggestion that should be relatively simple to implement, but looks like it would make (both formal and informal) reasoning about the Svcode interprepreter somewhat simpler and more regular.

Basically, (almost) all the current transducers effectively consist of an outer loop, which iterates until an Eos, and possibly also some additional inner loops that typically read until a T on a flag stream represeting a unary number. Let us call each iteration of the outer loop a "block"; then then it will always be the case that the blocks handled by a given transducer can actually be processed in any order without changing the result. On the other hand, inside each block, the order of data values is of course significant. Moreover (with one exception, detailed below) the number of blocks processed by each transducer is exactly equal to the the length of the control stream governing that transducer.

I think it would be good to make this structure more evident in the code. This would -- as far as I can tell -- require the following changes:

For example, if we introduce the function

loop0 :: Xducer () -> Xducer ()
loop0 xd = rin 0 (\ (BVal False) -> xd >> loop0 xd)

then we can express MapTwo as follows:

mapTwoN op = loop0 p
  where p = do x <- rinx "mapTwo(x)" 1
               y <- rinx "mapTwo(y)" 2
               rout (op [x,y])

This makes it clear that each block consists of two reads (one from each input stream) and one write, and also makes the handling of the two inputs more symmetric (i.e., an Eos on either one indicates an error). (I added an N to the name to clearly mark that this is the new API with the proper input stream numbers starting from 1. Once everything is converted, the N's can be removed.)

More interestingly, we can express SegScanPlus as:

segScanPlusXducerN = loop0 (p 0)
  where p acc = do BVal b <- rinx "segScanPlusXducer(flag)" 1
                   if b then return ()
                   else do IVal a <- rinx "segScanPlusXducer(data)" 2
                           rout (IVal acc)
                           p (acc + a)

This makes it evident that each "block" for segScan ends when we get a T on the flag stream, at which point we return to the outer loop

As far as I can tell, the only transducer that doesn't follow this pattern is in the code for iota, which uses MapConst on a stream that is not the current control stream, but that can be handled by using WithCtrl + Const instead of MapConst.

This API changeover can be done incrementally, because you have separate setup code for each Svcode instruction, separate from the actual transducer definition. Once it's complete, you should probably factor out the "loop0" from the individual xducer definitions, so that it only occurs once. Note that, with the separation between control and data streams, only the control streams can legitimately signal an Eos (i.e., return Nothing) when one attempts to read from them. On the other hand, control streams do not actually contain any meaningful data, but are only read for their Eos flag. So it may be possible to treat them rather differently in the interpreter, but for now, I think you can just leave them as they are.

If something is unclear in the above, we can talk more about it tomorrow.

Ginko-X commented 7 years ago

This is clear and makes sense.

What I'm not sure is if there is only one exception (i.e., MapConst). Like in #23, I think the xducer uSegCountXducer for implementing part function is also independent of the current control stream.

----- compiler code for the built-in function `part`
part :: STree -> STree -> SneslTrans STree
part (SStr t1 _) (SStr (BStr t2) f2) = 
    do s1 <- emit (USegCount t2 f2) -- USegCount depends on (t2, f2), which are the 2nd parameter of `part` function
       return (SStr (SStr t1 t2) s1)
Ginko-X commented 7 years ago

(I'm working on rewriting all Xducers with this loop0 function.) After rewriting, the ctrl stream will be added as a supplier to all the other processes (and all these processes also become its clients), which will first increase linear (maybe) work, (which may not be an issue?), and second make the DAG/dataflow graph even more complicated. So I'm wondering if this would increase the risk of deadlock?

afilinski commented 7 years ago

Since the control stream does not actually carry any data, only an Eos status, I think it's fine to not count reading from it as part of the work an xducer does. So mapTwo would still do 3 units of work per element (two reads and one write), while constXducer would do 1 unit (just a write) per element.

I'd be very surprised if if the extra explicit dependency on the control stream would increase the risk of (hard) deadlock. Potentially it might increase the frequency of stealing, but again, stealing a partially filled control-stream buffer is not really the same as stealing an actual data buffer. Let's not worry about that for now. Later, you can instrument the Svcode interpreter to collect detailed statistics on what actually happens in practice.

Incidentally, to express the new transducers, it think it would also be a good idea to further abstract out the inner loops inside a single block, specifically the ones that are based on reading a segment-flag stream, like in pdistXducer. I'm imagining a transducer builder loopu, similar to loop0, which would take a channel number and two transducers as arguments (xdf and xdt). It would then repeatedly call xdf for each F flag it reads on the given channel, and finally call xdt once it reads a T (and then return). That way, you wouldn't need any explicit recursion in the individual xducer definitions like segConcat (or the helper transducers like uOutx, uOutsx, uInOutx,etc.), but only in the loop-builder functions, which would make it much more explicit that they always terminate.

Ginko-X commented 7 years ago

I think Pack and UPack are also special since the flag stream they use is just a stream of random Fs and Ts, not unaries, so we can not determine the length of a block or the total number of blocks by just the current ctrl stream.

afilinski commented 7 years ago

I think that's actually not an issue with the Pack and UPack instructions themselves, but rather with how they are used by the compiler when compiling a type-indexed pack operation. At base types (and tuples of base types), the stream of values and the stream of flags are precisely the length of the control stream, and so packing behaves almost as MapTwo (except that it may output fewer values). However, at sequence types, each element of the high-level sequence corresponds to several stream elements, so you first distribute each pack flag across all the stream elements it governs, and then recursively pack those.

This means that the inner pack should actually be executed with a new control stream derived from the segment flags of the sequence you are packing. I think it should just be a matter of adjusting the pack function in the case for SStr to evaluate the recursive call inside a WithCtrl, similarly to how you also had to adjust the definition of the iotas function. We can talk about this at the meeting.