cdonovick / peak

Peak : Processor Specification Language ala Newell and Bell's ISP
19 stars 3 forks source link

Dealing with Sum types #68

Closed rdaly525 closed 4 years ago

rdaly525 commented 5 years ago

I perceive Sum.match() to be a major annoyance for magma.circuit.sequential and for executing arbitrary ADT objects as SMT BitVectors in PEak.__sim__() code (to speed up rewrite rule finding)

I propose that we have a separate (and more rewrite-able) API for matching on Sum Types. Basically what I want is boolean method(s) to indicate what type the Sum object is.

I suspect something like the following would be much easier to deal with.

SumType = Sum[A,B,C]

class PeakClass(Peak):
  def __call__(sumobj : SumType):
    if sumobj.isa(A):
      # use the object sumobj.value as an 'A'
    elif sumobj.isa(B):
      # use the object sumobj.value as a 'B'
    elif sumobj.isa(C):
      # use the object sumobj.value as a 'C'
cdonovick commented 5 years ago

There is going to be problems with Sums. However, this is not much easier to deal with than match.

phanrahan commented 5 years ago

Just reading this now. This seems very similar to match. I also am not that keen on using isa since isa is commonly used in object oriented systems for class hierarchy. A sum type is not the same thing.

cdonovick commented 5 years ago

@phanrahan I am currently working on a way to make adt types "polymorphic" in a similar way to bitvector. The goal is for inst.ALU == InstType.ALU.Add to return a bitvector expression. Specifically inst[ALU_BITS] == Assemble(InstType.ALU.Add) This way to convert to rtl / smt we just need to transform the program into SSA and pass an AssembledAdt through to get the entire sim function as a bitvector expression. I am currently going back and forth on whether match or isa (or some other name for same syntax if you dont like isa) will be easier to deal with.

phanrahan commented 5 years ago

I am not keen on isa, but it is not that big of a deal.

Btw, I am really enjoying working with the ADTs, particularly Sum.

But I am now changing my mind on Sum. Haskell requires you to name the variants of a Sum type. I think the main reason for that is to be able to repeat types. For example, you can create a some time which is the sum of two constant values. For example, Bit = Left () | Right (). I also could use named fields in a Sum type for encoding into bitvectors, and for matching again ala Haskell.

phanrahan commented 5 years ago

On the if-then-elif form.

One nice thing to be able to do is guarantee that all values are matched. Chisel defined a case function with a dictionary to ensure that all possible values were matched.

rdaly525 commented 5 years ago

@phanrahan, currently the Sums are 'tagged' and they are named exactly their type.

for example:

class A(Product):
  a=Bit
class B(Enum):
  b = 0
  c = 1
C = Sum[A,B]
assert C.A == A
assert C.B == B
phanrahan commented 5 years ago

That works for me. I can always wrap types to create names.

On Aug 1, 2019, at 4:52 PM, Ross Daly notifications@github.com wrote:

@phanrahan, currently the Sums are 'tagged' and they are named exactly their type.

for example:

class A(Product): a=Bit class B(Enum): b = 0 c = 1 C = Sum[A,B] assert C.A == A assert C.B == B — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

cdonovick commented 5 years ago

@phanrahan @rdaly525 @leonardt
I am almost done with AssembledADT however the current match interface is unsupportable. The reason for this is there now way to extract sum.value from an assembled instruction if the type is not a concrete value (e.g. is an smt or magma value). Now I was initially going to purpose the the following interface:

s = Sum[A, B](x)
type = s.match() # maybe rename to tag or type also probably a property instead of a method
if type == A:
    value = s.A
    # do stuff with A
elif type == B:
    value = s.B
    # do stuff with B

However, there is currently the following issue

T = Sum[BitVector[16], Bit]
s = T(x)
s.BitVector[16] # AttributeError s has no attribute BitVector
                # the [16] will be interpreted as a getitem on the attribute BitVector

To address this I purpose the following addition to the sum interface:

s = Sum[T, ...](x)
s.get(T) -> T

In simulation s.get(T) will raise an error if x is not T. In SMT / RTL s.get(T) will give you a garbage value.

Now to me if we are going to have a get(T) function it would make sense to me to pair it with a isa function. Although I am not in love with the name isa so maybe stay with match with a new interface. So the resulting code style should be:

s = Sum[A, B](x)
if s.match(A):
    value = s.get(A)
    # do stuff with A
elif s.match(B):
    value = s.get(B)
    # do stuff with B
rdaly525 commented 5 years ago

I am fine with this interface.

phanrahan commented 5 years ago

The match/get interface seems fine to me.

I think the fact that s.BitVector[16] is an error is another argument for named fields in a Sum.

phanrahan commented 5 years ago

More specifically, can we add the capability do define a Sum as

Sum[a=A, b=A, c=B]
cdonovick commented 5 years ago

That syntax is not valid python unfortunately.

We could define them like Products / Enums or the following:

Sum['a':A, 'b':B]

As 'x:'X will be interpreted as a slice.

Caleb

On Mon, Aug 5, 2019, 11:22 AM Pat Hanrahan notifications@github.com wrote:

More specifically, can we add the capability do define a Sum as

Sum[a=A, b=A, c=B]

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/cdonovick/peak/issues/68?email_source=notifications&email_token=ABOXO5BPKULAGSMWNYA2EW3QDBAT7A5CNFSM4IFAZGUKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3SFCOA#issuecomment-518279480, or mute the thread https://github.com/notifications/unsubscribe-auth/ABOXO5EHYTHNLGVJSF7NJ5DQDBAT7ANCNFSM4IFAZGUA .

phanrahan commented 5 years ago

I didn't check that syntax. Need to brush up on python.

Yours is clever.

The Product/Enum ways also works for me.

cdonovick commented 5 years ago

TIL: No markdown in email response to github threads.

Also I think I have slightly altered proposal for syntax so that I don't have to reserve the names get and match (any method name excludes having tags name similarly so is undesirable). E.g Sum[match, get] is an error.

For the get part it would be consistent with the rest of the library to use getitem

Tuple[A, B, C](a, b, c)[0] == a
Sum[A, B, C](a)[A] == a

# we have the following invariant given an ADT type T and instance t
isinstance(t[idx], T[idx])

# similarly we the following invariant given an ADT type T and instance t
# excluding Enums and when Sums break because of stuff like bitvector[16]
isinstance(t.attr, T.attr)

For match I propose using contains (I am going to add contains operator to Sum types anyway). I am less big on this then I am on using getitem so feel free to reject partially.

# __contains__ on types
A in Sum[A, B, C] # equivalent to A in Sum[A, B, C].fields 
D not in Sum[A, B, C]

# __contains__ on instances
A in Sum[A, B, C](a)
B not in Sum[A, B, C](a)
D in Sum[A, B, C](a) -> TypeError

We could and contains to Tuple and Product but I don't see what purpose they would hold.

cdonovick commented 5 years ago

Okay it turns out that __contains__ will not work the way we want it to because python calls bool on the return value of __contains__.

cdonovick commented 5 years ago

Other possible syntaxes are Probably my prefered:

s.match(A)
~s.match(B)
s.match(C) -> TypeError

More annoying to implement:

A == s.match
B != s.match
C != s.match

We could also abuse some other operator:

A < s
~(B < s)
C < s -> TypeError

Any preferences?

phanrahan commented 5 years ago

And here s = Sum[A, B, C]?

cdonovick commented 5 years ago

Ooops lost intro along the way of editing. s = Sum[A, B]

phanrahan commented 5 years ago

s.match(A) works for me. (Still lobbying to make A a tag not a type.)

cdonovick commented 5 years ago

The tags vs type is somewhat an orthogonal issue. I want to get the assembled adt stuff working. I have been planning an over hall of the ADT types anyway to commodify the getitem syntax. When I do that Ill work on the TaggedUnion, which should pretty easy add.

cdonovick commented 5 years ago

Okay I think I have a good interface that will work well for both TaggedUnion and Sum and which has a nice correspondence with Product and Tuple. I purpose the following syntax

S = Sum[A, B]

# Okay this is a bit dumb buts its mostly because of a correspondence I want to make
assert S[A] == A 
assert S[B] == B 
assert S[C] == Error

a = A()
s = S(a)
assert s[A].match
assert not s[B].match
assert s[C].match == Error

assert s[A].value == a
s[B].value is undefined it may raise an error or return a garbage value

This is achieved by having s[A] return a Match object with properties match and value. Implenting this for AssembledADT's would not be problematic because given a similar Sum t in our current syntax s[A].match iff t.match(A) and s[A].value == t[A].

For TaggedUnion the syntax can be:

class R(TaggedUnion):
   L0 = A
   L1 = B
   L2 = B

assert issubclass(R, S)
assert R.L0 == R[A] == S[A] == A
assert R.L1 == R.L2 == R[B] == S[B] == B
assert R.L3 == Error
assert R[C] == S[C] == Error

b = B()
r = R(L1=b)
assert not r.L0.match and not r[A].match
assert r.L1.match and r[B].match
assert not r.L2.match
assert r.L1.value == r[B].value == b
r.L0.value is undefined
r.L2.value is undefined

--------------------------- NEW CONTEXT ---------------------------

Now note: (with some abuse of notation) given types T0, T1, ..., Tk and field names L0, L1, .. Lk

P = Product[L0=T0, L1=T1, ..., Lk = Tk]
Q = Tuple[T0, T1, ..., Tk]
assert issubclass(P, Q)

for i in 0..k:  assert Q[i] == P[i] == P.Li == Ti
for i not in 0..k: assert Q[i] == P[i] == Error ~= P.Li
# you get a different error on P[i], Q[i] vs P.Li but its still an error

Similarly,

R = TaggedUnion[L0=T0, L1=T1, ..., Lk = Tk]
S = Sum[T0, T1, ..., TK]
assert issubclass(R, S)

for in 0..k: assert S[Ti] == R[Ti]== R.Li == Ti
for i not in 0..k: assert S[Ti] == R[Ti] == Error ~= R.Li

Further these correspondences extend to their instances:

p = P(L0=t0, L1=t1, ..., Lk=tk)
q = Q(t0, t1, ..., tk)
for i in 0..k:  assert q[i] == p[i] == p.Li ==ti
for i not in 0..k: assert q[i] == p[i] == Error ~= p.Li

While not quite as cleanly similar as with types,

r = R(Lj=tj)
s = S(tj)

for i in 0..k: assert r.Li.match iff i == j
for i in 0..k: assert r.Li.match implies r[Ti].match
for i in 0..k: assert r[Ti].match iff s[Ti].match iff Tj == Ti
for i not in 0..k: assert s[Ti] == r[Ti] == Error ~= r.Li

for i in 0..k: assert r[Ti].match implies r[Ti].value == s[Ti].value == tj
assert r.Lj.value == tj
# for all other values undefined behavior

Reasoning: First of all this avoid string base programming which I personally hate. Specifically I was dreading r.match("L0") style syntax. Strings should be strings, they shouldn't have semantic meaning. Not only is it ugly, their is poor support for it in text editors and is prone to bugs.

Secondly if you know the layout of a TaggedUnion you should be able to access it without actually specifying a specific tag. Much like pat currently uses sum.value.