Adding a "type" checker to Forth

ForthHub / discussion

Discussion repository for Forth enthusiasts.

118 stars 4 forks source link

Adding a "type" checker to Forth #79

Open massung opened 5 years ago

massung commented 5 years ago

One toy project I've played around with in the past is the adding a "type" checker to Forth. I don't mean that in the sense of types like C and a compiler, but rather just confirming inputs/outputs by assuming the stack notation identified actual types. For example:

: c" ( -- caddr )
: s" ( -- caddr n )
: dup ( x -- x x )
: + ( n n -- n )
: /string ( caddr n n -- caddr n )

Assuming that some "types" (e.g. x or _) were pre-defined to mean "what's there", and any other symbol was just a type created on-demand (no pre-declaring of types being necessary), it should be possible to take something like s" Hello, world!" 10 dup + /string and validate that all the stack values and output types match with the required input types specified by each word. Likewise, if I'd used c" instead by accident, it should have discovered at /string that the expected stack values didn't match the required inputs.

A few times I've gone down this road, trying it out for fun, but kept getting hung-up on in a couple areas:

Dynamic words like execute totally borking the stack. The only idea I had was to assume the programmer knew what they were doing and pretend the rest of the word was "good". For example:

: foo ( n -- caddr ) ['] bar execute do something ;

After execute, just skip type checking until ; and then assume the stack is updated to whatever foo says it should be. There was an instance where this failed, but I can't recall what it was.

Branching return values (e.g. find). These almost always require context to correctly deduce. And I couldn't ever come up with a checking strategy that "just worked".

With find for example, the second stack item could be caddr or xt, and which it is depends on the top stack item's value (not type). The best I could come up with was just branching the type checking and at each step determining if either branch was successful.

Sorry, long post, thinking out-loud remembering a past bit of work I did, but wondering if anyone here had ever tried something similar (or knew of a similar attempt)? How did it turn out? Was it useful? Do you think something like this would be valuable in your own forth system?

massung commented 5 years ago

Just remembering one other issue I had:

Loops that didn't leave the stack balanced.

A word like : iota ( n -- i*x ) do i loop ; would leave an indeterminate number of items on the stack, and that ended up causing enumerable headaches trying to resolve. I never came up with a good solution to this one.

alexshpilkin commented 5 years ago

Quoting Jeffrey Massung (2019-01-10 18:29:37):

A few times I've gone down this road, trying it out for fun, but kept getting hung-up on in a couple areas:

Well, these are hardly innocuous, are they?

Dynamic words like execute totally borking the stack. [...]

execute (apply, in the wider PL lore) is what turns Forth from a first- to a higher-order programming language (the way calling function pointers does that in C). Not only that, it’s polymorphic (can have different types depending on types of its arguments): the same way the C++ typing of apply (C can’t type it; I hope my C++ still works) would be

template<typename R, typename A> R apply(R (*f)(A), A a));

the type of execute is something like

forall i o. i (i -- o) -- o

which is both polymorphic and requires you to track types of arguments, not just their number.

Branching return values (e.g. find). [...]

Loops that didn't leave the stack balanced [...] like : iota ( n -- i*x ) do i loop ; [...]

Again in the wider PL lore, these require what’s called “dependent typing”—their types (as long as you want types to include at least the number of things on the stack) depend on the values they produce (find) or consume (iota), breaking the separation between the two worlds. The problem here is that as long as you want your typechecker to check what it claims to be checking (Milner’s “well-typed programs can’t go wrong”; counterexample: C++) every value computation that takes place in types has to be (provably) terminating. In short, I don’t think anybody has done dependent types without fulling the types into a full-blown logic (ATS, Idris, Agda).

Or maybe I’m being too harsh on find... You can probably get away with it using ‘intersection types’ (“produces either this or that on output”, “has to handle both this and that on input”). But iota definitely has to go.

[...] wondering if anyone here had ever tried something similar (or knew of a similar attempt)? How did it turn out? Was it useful? Do you think something like this would be valuable in your own forth system?

Haven’t done anything myself, but have done quite a bit of searching of the literature. The clearest thing I’ve seen about this is Kleffner’s talk about type inference in stack languages. IIRC the polymorphic system offered there is enough to type execute, at least.

There’s also a proposal on MPE’s website along the lines of “we’d really like this to exist, but can’t be bothered to write it, and oh we’d also like it to handle all of our code”. I don’t like the idea of a typechecker being akin to lint... But then I’m probably at an extreme end of the people thinking about type systems---I’d like a type system to express a consistent theory of the code, however simplistic, so e.g. I find the “balance of soundness and practicality” expounded by the creators of TypeScript a nonsensical concept. (Wirth’s disdain for “reliable” programs is similarly extreme and similarly disconnected from programming practice, I think.)

-- Alex

MitchBradley commented 5 years ago

You can't solve every problem so pick a problem that you can solve that has a worthwhile payoff, solve it, and iterate. Einstein failed to find a theory of everything, but the theories that he did find are pretty darned useful. I once did a quick and dirty run-time checker that just looked for stack balance. It included an abbreviated representation of the stack diagram in the compiled representation of the word. If I recall correctly, I just recorded the number of arguments and the number of results. Upon entry, the stack was checked for enough depth for that many arguments, and upon exit, for the correct net depth change. For complex cases I just punted and didn't check. It helped find some errors in a large system. Good value for a modest amount of effort.

jwoehr commented 5 years ago

On Fri, Jan 11, 2019 at 11:28 AM Mitch Bradley notifications@github.com wrote:

You can't solve every problem so pick a problem that you can solve that has a worthwhile payoff, solve it, and iterate.

Right, "if we had $10 for every time someone suggested adding type checking to Forth", right, Mitch?

-- Jack Woehr Absolute Performance, Inc. 12303 Airport Way, Suite 100 Broomfield, CO 80021

NON-DISCLOSURE NOTICE: This communication including any and all attachments is for the intended recipient(s) only and may contain confidential and privileged information. If you are not the intended recipient of this communication, any disclosure, copying further distribution or use of this communication is prohibited. If you received this communication in error, please contact the sender and delete/destroy all copies of this communication immediately.

alexshpilkin commented 5 years ago

@MitchBradley wrote:

You can't solve every problem so pick a problem that you can solve that has a worthwhile payoff, solve it, and iterate.

Well, I did warn that my definition of “works” is called something between “too radical” and “useless” by many people, including people who are vastly better at this than I am. I’m aware I have an unhealthy attachment to formal systems (physics does that to you). So, no contest here.

Einstein failed to find a theory of everything, but the theories that he did find are pretty darned useful.

I must take offense at this, though: SR, GR, whatever other modern physical theory you care to name are limited, yes, but the way we use them now, they come with a well-defined range of validity built in. More specifically, a theory consists of a more or less internally consistent formal model plus a prescription in which situations the formal model will describe what will actually happen. A physicist always has a sense of whether a particular description will or will not work—even if it’s not possible to check this in advance, you at least know what you’ll be checking as you go.

So I don’t think it’s fair to say modern physics does demand-driven, ad hoc iterative improvement in the same way that e.g. compiler optimizers do; internal consistency (compositionality?) at each step is also important. Maybe engineering does, I’ve honestly no idea.

If we try to transplant it back to type systems, it’ll probably go like this: a type system doesn’t have to be all-singing and all-dancing, but it’d better be explicit and (definitionally, not operationally) simple about what it will and won’t do. If a type is just (# cells in, # cells out), execute will be untypeable, and this is fine: “only first-order words are typeable“ is a reasonable rule; but if I can write execute without noticing it, without a trustMe or unsafeCoerce or whatever, it’ll make me have PHP nightmares^W^W^Wsad.

@jwoehr wrote:

Right, "if we had $10 for every time someone suggested adding type checking to Forth", right, Mitch?

Not to say there’s anything wrong with your statement in particular, but I think it’s not fair to place the blame completely on the suggesters, either, because of a general problem:

The literature sucks.

I mean, what do you have about Forth that counts as a real review of the design space, hell, as a honest-to-God literature review of any kind? Thinking Forth (even if it’s half advice on running a software consultancy), Moving Forth (best review ever to not call itself a review), a couple more tutorial implementations, a summary article on the CASE contest and... that’s it?

I can read a Forth Dimensions issue or ten; some JFAR articles; no FORML articles because, well, fat chance I can access any of them... But that’s not reasonable; I’m not an actual programmer precisely because I enjoy this sort of thing too much. Someone who’s better at, y’know, actually building stuff won’t do that!

And that is why you need to have literature reviews. If you compare with science, it seems that the Forth literature is perhaps small enough for one person to be able to read it all (like linguistics; unlike any area of physics), but that presumes reading things is (as in science) one of their primary activities, which I don’t think it is.

So, I’m aware everything has probably been discussed to hell and back... But can we please have some references?

alexshpilkin commented 5 years ago

(I don’t know you, so sorry if that came of as too flippant... the desire to make a cutely arranged argument overcame the effort to make a polite one.)

jwoehr commented 5 years ago

On Fri, Jan 11, 2019 at 4:05 PM Alexander Shpilkin notifications@github.com wrote:

@jwoehr https://github.com/jwoehr wrote:

Right, "if we had $10 for every time someone suggested adding type checking to Forth", right, Mitch?

Not to say there’s anything wrong with your statement in particular, but I think it’s not fair to place the blame completely on the suggesters, either, because of a general problem:

The literature sucks.

Not precisely. What there is, is good, but there was never enough. Because Forth is, was, and always will be a minority approach. It appeals to a personality type who is not in the mediocre middle. If you're going to be a Zen monk, you have to accept that you'll carry your belongings on your back and travel dusty backroads.

I'm not exactly arguing with what you are saying. I'm just teasing Mitch because the community has been having these same discussions for about 35 years, over and over again, with each person who joins the community.

-- Jack Woehr Absolute Performance, Inc. 12303 Airport Way, Suite 100 Broomfield, CO 80021

cwpjr commented 5 years ago

Yes, as I forge new waters, i have only myself and my pack, Jack~!

On Fri, Jan 11, 2019 at 5:32 PM Jack J. Woehr notifications@github.com wrote:

On Fri, Jan 11, 2019 at 4:05 PM Alexander Shpilkin < notifications@github.com> wrote:

@jwoehr https://github.com/jwoehr wrote:

Right, "if we had $10 for every time someone suggested adding type checking to Forth", right, Mitch?

Not to say there’s anything wrong with your statement in particular, but I think it’s not fair to place the blame completely on the suggesters, either, because of a general problem:

The literature sucks.

Not precisely. What there is, is good, but there was never enough. Because Forth is, was, and always will be a minority approach. It appeals to a personality type who is not in the mediocre middle. If you're going to be a Zen monk, you have to accept that you'll carry your belongings on your back and travel dusty backroads.

I'm not exactly arguing with what you are saying. I'm just teasing Mitch because the community has been having these same discussions for about 35 years, over and over again, with each person who joins the community.

-- Jack Woehr Absolute Performance, Inc. 12303 Airport Way, Suite 100 Broomfield, CO 80021

NON-DISCLOSURE NOTICE: This communication including any and all attachments is for the intended recipient(s) only and may contain confidential and privileged information. If you are not the intended recipient of this communication, any disclosure, copying further distribution or use of this communication is prohibited. If you received this communication in error, please contact the sender and delete/destroy all copies of this communication immediately.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ForthHub/discussion/issues/79#issuecomment-453690509, or mute the thread https://github.com/notifications/unsubscribe-auth/AFC6xcUsNXHU7SUSvRbXClSir-Nk6C7mks5vCR8fgaJpZM4Z5uqG .

MitchBradley commented 5 years ago

When I first got interested in Forth, I think it was in 1982, I read everything I could find about Forth. Back issues of Forth Dimensions, FORML Proceedings, JFAR issues, ... I was hoping not to reinvent the wheel. I was able to access all that because the primary distributor of Forth literature, Mountain View Press, was run out of Glen Haydon's garage, which happened to be a few blocks away from my house!

Since then I have seen just about every imaginable wheel being reinvented innumerable times. That's sort of inevitable, given the time it would take to delve through all the history.

In answer to the request for a reference: For some early work on Forth type checking, have a look at Peter Knaggs' PhD thesis

alexshpilkin commented 5 years ago

@MitchBradley wrote:

For some early work on Forth type checking, have a look at Peter Knaggs' PhD thesis

Thanks for the reference! Somehow I missed until now that Knaggs is also the one Kleffner credits with the polymorphic type system in section 4 of the talk I linked to before, although the latter describes it in terms much closer to the PL literature (indeed, the former seems unaware of the word “polymorphism” itself).

@massung: For some reason I forgot to say this in my initial post, so here goes: the rest of RK’s talk (though phrased in Forth-unfriendly terms) works up to a solution due to Diggins (2008) of the problem of (in Forth-friendly terms) typing EXECUTE (as well as functional composition >R EXECUTE R> EXECUTE and so on).

On a very different note, nobody here has mentioned StrongForth yet (and I only just read the manual earlier today). This seems to be a system heavily focused on ad-hoc overloading and of a very different style in general: it seems to me that a StrongForth program without its types cannot be given meaning at all (i.e. this is a Church-style system, unlike the systems of Knaggs and Diggins that can work Curry-style, on existing untyped Forth code). I’m still a bit confused by it, though, so please correct me if I’m wrong.

olleharstedt commented 2 years ago

Nice read :)

I recently got recommended Forth and of course got trapped in the same bucket as every other newbie. My personal attempt to add types to Forth was to embed it like the Haskell example above, but using phantom types + a syntax extension to remove ugly notation.

Original (and trivial) Forth snippet:

1 2 + .

In OCaml, explicitly passing around the stack state:

(((start () |> num 1) |> num 2) |> plus) |> dot

In the syntax extension (written by Chet Murthy; still inside normal OCaml code):

[%forth 1 2 plus dot;]

Maybe a limited approach. The functions have to be typed explicitly, like:

val plus : ([`number] * ([`number] * 'a)) t -> ([`number] * 'a) t

To allow proper Forth words, you'd need to add a lexer, which you can do while staying inside the OCaml environment. It could be made interactive too, by including the syntax extension in the so called toplevel.

A nice property of this approach is that it spits out "normal" Forth code. So it's a strict superset (or maybe rather a typed subset) of Forth.