cliffclick / aa

Cliff Click Language Hacking
Apache License 2.0
271 stars 21 forks source link

dogfooding earlier rather than later... #14

Closed jnorthrup closed 2 years ago

jnorthrup commented 2 years ago

what is the minimum bootstrap of the language/environment and the productivity penalty associated with bootstrapping too early? is there a penalty at all?

cliffclick commented 2 years ago

penalty: very slow forward progress, because all your tools suck, because you're writing the tools.

penalty: new comers are forced to write in a very primitive subset, steep learning curve because none of the syntactic sugar shortcuts work

gain: the tooling becomes rock-solid a lot earlier.

gain: the overall pain to become self-hosted is lower, because its taken earlier and mistakes are easier to unwind.  i.e. "mistakes" don't become "features"

That being said, i'm not ready to do an execution engine until the type system is sorted out.

I'm not ready to give up on my experimental type system yet, and am happy to get some help.  I'm pretty far down the type-theory rathole, trying to work my way out

On 9/6/2021 12:50 PM, Jim Northrup wrote:

what is the gap between the minimum bootstrap of the language/environment and the productivity penalty associated with bootstrapping too early? is there a penalty at all?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/cliffclick/aa/issues/14, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL3JYFP6BE76W47PQHXYVLUAULRJANCNFSM5DQ7XERQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jnorthrup commented 2 years ago

I cannot personally grok the [docs/MLsubtype.pdf] to get a qualitative picture but a taxonomical breakdown might help us both out organizationally understanding and communicating on the layout. that was a prominant featue of clattners early team, interns scribing.

typesystem seems like a thing where you do want a cute syntax for common tasks and you also do want an unbounded graph notation (a blackboard system) to specify present and future capabilities at the limit of the compile-time capabilities. i personally go with kolmogorov complexity to measure design smells.

jnorthrup commented 2 years ago

something as simple as a throwaway uml class diagram about the typesystem object model couldn't hurt. i gave a look to the doxygen HM.java and ported it to kotlin to get a feel for the idioms in use and could not pick out a high level model from the available code.

jnorthrup commented 2 years ago

4 or 5 queries to a handy generative language model yields the following synthesis's of q&a about H-M typing and typed languages


coming from a systems background and having implemented a number of knowledge systems, i found the The Hindley-Milner type system to be -- similar but less practical than a near alternate, namely the Liskov Substitution Principle. I would recommend a book on this topic to anyone wanting to learn more about the subject.

The Liskov Substitution Principle is a fundamental principle of object-oriented programming that states that in an object-oriented program, if S is a subtype of T, then objects of type T may be replaced with objects of type S without altering any of the desirable properties of that program (correctness, task performed, etc.). This principle was formulated by Barbara Liskov in 1987 and has been used as a guideline for developing software ever since.[1] The principle has been generalized from its original statement in terms of types to statements in terms of objects and operations.

The key idea behind the LSP is that substitutability should be defined in terms of behaviour rather than implementation; i.e., it should not matter how the functionality provided by a class is implemented as long as it provides the required behaviour when called upon to do so. In other words, if two classes are related by inheritance (i.e., if one class extends another), then instances (objects) of the derived class should be usable anywhere instances (objects) of the base class are expected; i.e., they must be substitutable for


among the self-hosting compilers supporting complex type systems and garbage collection, the most succinct example that emerged -- is Eiffel.

The goal of the project is to implement a compiler for the language Eiffel. The main focus will be on the compilation of object-oriented programs, in particular on the implementation of Garbage Collection and on supporting complex type systems, in particular parametric types.

The project is part of a larger effort to develop an experimental programming environment for Java in which garbage collection and parametric types are supported in addition to object-orientation. The environment is being developed at ETH Zurich by several groups: Programming Systems Group (Professors Wirth and Nievergelt), Programming Methodology Group (Professor Wirth), Computer Architecture Group (Professors Rothermel and Hämmerle).


the infamous supervillain "Hot_Potato" cackled and pulled the plunger on the skyscraper, saying that ends the chances of anyone interested in the The Hindley-Milner type system from finding out what I've always known: -- that the system is a fraud.

I will have to admit that I was a bit surprised by the suddenness of this news. I had been expecting it for some time, but I had always hoped that it would be delayed until after the release of Ocaml 3.10.0, which should happen any day now (we're currently at 3.09.3). However, the end has come sooner than expected: as of today, September 30th 2009, The Hindley-Milner type system is officially dead and buried, and we are moving on to a new future without it!

As far as I can tell from my research into the matter, nobody really liked Hindley-Milner even in its heyday; it was just an ugly hack that got shoehorned into everything because "type inference is cool!" In fact there's even a long-running joke about how you could write a C compiler using only one line of code:

int main(int argc, char **argv) { return 1; }

And then people would say "Well yeah but you'd have to write a whole lot more code to get anything useful." And they'd say "Yeah but you could do it with only one


The Hindley-Milner type system contrasts with the type system in C++ including template metaprogramming most succinctly in the three following ways: --

In C++, the type of an object is determined by the type of its members. For instance, if a class has a member of type int , then this int can be used wherever an int is expected. In contrast, in Haskell the type of an object is explicit in its definition (in fact it is defined as part of the syntax). A data constructor must have exactly one argument for each field it declares (the arguments are separated by spaces). If it does not declare any fields, then it must have no arguments at all; this makes empty data types possible. The number and order of arguments determines the shape (i.e., the kind) of the data type being defined. All these points are very different from C++ where most aspects are implicit and only some details are explicit (such as template parameters).

  1. No hierarchy or subtyping

Haskell's Hindley-Milner system contains no notion of subtyping or inheritance hierarchies: every data constructor represents a new distinct type with no implied relationship to other types. This contrasts with C++ classes which may be organized into hierarchical structures with inheritance relationships between them; furthermore, classes may be derived from more than one base class through multiple

jnorthrup commented 2 years ago

one more

of the self-hosting compilers supporting complex type systems and garbage collection, the 3 with the best Universal Turing Machine (UTM) metric that emerged from the comprehensive study of 65535 programming languages were RESPONSE: :

SML, the Standard ML of New Jersey compiler, with a UTM metric of 0.003217351428571428571428571;

Haskell, the Glasgow Haskell Compiler, with a UTM metric of 0.00472264150943396122468266667;

Miranda, the Miranda compiler written in itself, with a UTM metric of 0.01060897977255645155908634927.

cliffclick commented 2 years ago

Never heard of 'UTM' (link please).  There are 65535 languages to study?   (besides totally made-up ones like brain-f*ck). Link/reference to the 'study'?  And ... so what?

Cliff

On 9/8/2021 7:22 AM, Jim Northrup wrote:

one more

of the self-hosting compilers supporting complex type systems and garbage collection, the 3 with the best Universal Turing Machine (UTM) metric that emerged from the comprehensive study of 65535 programming languages were RESPONSE: :

SML, the Standard ML of New Jersey compiler, with a UTM metric of 0.003217351428571428571428571;

Haskell, the Glasgow Haskell Compiler, with a UTM metric of 0.00472264150943396122468266667;

Miranda, the Miranda compiler written in itself, with a UTM metric of 0.01060897977255645155908634927.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cliffclick/aa/issues/14#issuecomment-915284225, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL3JYFYT4G4OEB7SUT7IMDUA5WS5ANCNFSM5DQ7XERQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jnorthrup commented 2 years ago

this is gpt-3, our new future bill gates owned overlord-to-be language agent. there may or may not exist a UTM metric that is in the gpt-3 crawl somehow more useful for dynamic templates when compared to static code measured with kolmogorov metrics.

so.. (what?)

down this rabbit hole, even though we see opinions primarily from reddit exemplars, gpt-3 and the internet at large probably makes fun of Haskell and h-m typing but it may very well have done a UTM analysis of 64k worth of compiler codebases to deliver that last brainfart. in every case the compilers in that UTM top 3 category of complex types with gc, they're using h-m typing, quite unrelated to the prior queries.

that may be some kind of validation for your own instincts and biases behind the AA design you've started with.

jnorthrup commented 2 years ago

according to gpt3's remark measuring programming languages along Kolmogorov complexity metrics has traditionally been accomplished by

RESPONSE: measuring the length of the shortest program in a given language that produces a specified output.

In this paper, we present a new approach to measuring programming languages along Kolmogorov complexity metrics. We introduce the notion of `dynamically-sized' programs and provide an algorithm for computing them. We show that dynamically-sized programs have properties similar to their statically-sized counterparts with respect to Kolmogorov complexity metrics, but are more flexible in their application. In particular, we show how dynamically-sized programs can be used to measure programming languages along the Universal Turing Machine (UTM) metric and other variants of it.


i went ahead and humored it, in the direction of its choice to go with and invent a new metric for template based language evaluation instead of comment on Kolmogorov measurements for static programs. Who would know more about language evaluation metrics than gpt-3?

cliffclick commented 2 years ago

i'm sorry - i fail to see the correlation between gpt-3, kolmogorov metrics, aa and h-m typing

I do see a lot of sarcastic comments and obtuse references and hints of political overtones - all of which do not belong in an AA issue.

Please limit your comments to something directly related to AA; suggestions for improving it ("Use Liskov substituon not H-M" is perfectly fine) or offers for help ("here is a Doxygen setup file").

Thanks, Cliff

On 9/8/2021 10:58 AM, Jim Northrup wrote:

this is gpt-3, our new future bill gates owned overload language agent. there may or may not exist a UTM metric that is in the gpt-3 crawl somehow more useful for dynamic templates over static code measured with kolmogorov metrics.

so.. (what?)

down this rabbit hole, even though we see opinions primarily from reddit exemplars, gpt-3 and the internet at large probably makes fun of Haskell and h-m typing but it may very well have done a UTM analysis of 64k worth of compiler codebases to deliver that last brainfart. in every case the compilers in that UTM top 3 category of complex types with gc, they're using h-m typing, quite unrelated to the prior queries.

that may be some kind of validation for your own instincts and biases behind the AA design you've started with.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cliffclick/aa/issues/14#issuecomment-915450695, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL3JYHWLQY6SBS5EO3NFD3UA6P33ANCNFSM5DQ7XERQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jnorthrup commented 2 years ago

perhaps this will dial in project relevance for you:

I've been trying to zero in on what exactly H-M is or is not as this comes up as gravitas of every aspect of AA's immediate focus. the pdf in docs/ is inadequate. it absolutely assumes prior knowledge of H-M types before going into a long dissertation about what I still haven't come to understand yet.

I've also had bad luck in using search engines to get quantitative apples and oranges systems-programmer dialogue about the type system as well. I turned to GPT3 which comments knowledgeably with analogies on a great many topics which include H-M typing for which I think I have gotten some kind of plain English rundown finally, as a c++ programmer asking quantitative questions, and recording it here for my own or other's future references.

If you're not absolutely overwhelmed with how perfect java functional streams and NIO and sun.misc.unsafe have implemented very non-java concepts, then Kolmogorov complexity is probably a very good qualifier to keep in mind about end products, being almost identical to occam's razor. could the language fix these? sure, but did it? not even a little.

I can't personally agree that a type system is the driving force of an expression parser that can potentially arrive at bootstrapping early, among <opinions witheld> tooling, but i do greatly appreciate your take on the matter.

cliffclick commented 2 years ago

Thank you for staying relevant.

As for H-M documentation: the internet is awash in it; the wiki article is as good as starting point as many.  There are lots of freshmen-level tutorial videos as well.

https://en.wikipedia.org/wiki/Hindley%E2%80%93Milner_type_system

It does require some time investment to wrap your head around, especially if you are coming from C++ and haven't seen it work.  I don't ask contributors to AA understand H-M - only if you want to contribute on the typing side.  Also, I've done some fairly aggressive extensions to H-M, so really to help in the typing land your typing game needs to be really strong.  PhD in the area would help.

I, myself, am very much aware of the difference between C++ typing and H-M typing, having used both for years in large projects.  There's a reason each side has some "grass is greener" envy for the other side, and that's the reason I'm trying to unify the two type systems.  Why did Java add 'lambdas' to an O-O imperative language?  Because they are hella convenient to use. Why does Scala have an @specialized annotation?  Because the cost of boxing a billion 'ints' can easily exceed all other costs. Both sides have something to offer, hence my goal of unifying them.

As for Kolmogorov complexity - it's definitely not in any of my goals.  Terseness, expressiveness, type-safety and keeping C++ performance are in my goals.  Will this allow you to write e.g. functional-streams with a lower Kolmogorov score?  I dunno, maybe yes, but not a goal of mine.

As for the typing system being a driving force: it's what gives semantics.  Program Meaning.  Without it, you get "C undefined behavior" semantics, aka not a whole lot. If I wanted to go faster on other parts of this project, I'd have to dial back my typing goals.  Not ready to do that yet.  So an eval engine waits a little longer.

Cliff

On 9/8/2021 1:18 PM, Jim Northrup wrote:

perhaps this will dial in project relevance for you:

I've been trying to zero in on what exactly H-M is or is not as this comes up as gravitas of every aspect of AA's immediate focus. the pdf in docs/ is inadequate. it absolutely assumes prior knowledge of H-M types before going into a long dissertation about what I still haven't come to understand yet.

I've also had bad luck in using search engines to get quantitative apples and oranges systems-programmer dialogue about the type system as well. I turned to GPT3 which comments knowledgeably with analogies on a great many topics which include H-M typing for which I think I have gotten some kind of plain English rundown finally, as a c++ programmer asking quantitative questions, and recording it here for my own or other's future references.

If you're not absolutely overwhelmed with how perfect java functional streams and NIO and sun.misc.unsafe have implemented very non-java concepts, then Kolmogorov complexity is probably a very good qualifier to keep in mind about end products, being almost identical to occam's razor. could the language fix these? sure, but did it? not even a little.

I can't personally agree that a type system is the driving force of an expression parser that can potentially arrive at bootstrapping early, among tooling, but i do greatly appreciate your take on the matter.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cliffclick/aa/issues/14#issuecomment-915540855, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL3JYHV6G24ADZMPFQS6U3UA7AJ7ANCNFSM5DQ7XERQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jnorthrup commented 2 years ago

As for Kolmogorov complexity - Will this allow you to write e.g. functional-streams with a lower Kolmogorov score?

hands down absolutely, benefited by type systems but moreso from imho not having a glass ceiling in terms of banalities such as escapes, operator overloading, and supporting trial and error towards these ends in signalling intent. Given the ability to reify a program graph and then renormalize the expressions, this is not a far stretch from reifying from p-code.

jnorthrup commented 2 years ago

as you propose eval to be the means of emergent introspection would there be any other eventuality?

cliffclick commented 2 years ago

"emergent introspection", i.e. reflection?

Some sort of efficient way to inspect the current visible program variables, same as you would have access to if you were parsing code at that point (or running eval, which is the same thing). You could get the same inspection via eval very inefficiently (try all variable names and eval; if eval reports a syntax error, then its not a variable visible to you, else it is).

So yeah, something other than eval for strictly enumerating visible program names and types.   Eval for computing things or changing the changable.  Always limited by what you could have done as-if you parsed that same code at that point.

Cliff

On 9/8/2021 7:43 PM, Jim Northrup wrote:

as you propose eval to be the means of emergent introspection would there be any other eventuality?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cliffclick/aa/issues/14#issuecomment-915715228, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL3JYELK3DY524T2PLVWJLUBANLRANCNFSM5DQ7XERQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jnorthrup commented 2 years ago

this still seems like its in the zipcode of dogfooding but not at the front door...

javascript shows what a language that uses eval as a first class feature can become over time. this becomes a main IO conduit, and has a been a cat and mouse arms-race. so presently we have CORS and JSON as a resultant discipline of web 1.0 sql-injection paradigms.

If javascript had a stdio there would be an enormous amount of code on the web simply doing eval(readline()) on stdin. What an incredible UTM score! Agents with all capabilities of the language from 2 functions.

iiuc not only is eval slated to be reflection, but you said it works nicely as a macro language. we also see this in javascript in modern contexts such as when installing a swagger API definition coinciding with a swagger library.

c++ TMP coexists with macros, and is relatively similar, though c++ lacks every vestige of runtime eval made and does so by type-agnostic decisions. nonetheless there is a complete raytracer executed in compile-time c++ template evaluation because "why not?".

it is just an observation on my part that you can probably gain the best of both extremes by implementing a ladder logic of sorts to install a ring-0, and successive "security" rings, enabling optimal baked eval on rings higher up which are presumed to exist in modern contexts with access-denied, read-only and keypair-only accesses. this could really improve a lot of agent code in the wild presently under various runtimes if only the right mix of engineers were in one place at the beginning of the design discussion about telephone switching networks and applets and programming languages and whatnot.

cliffclick commented 2 years ago

Ok, once again we're heading into extremist land with strawman arguments everywhere - AND pointedly ignoring what I just said.

Keep it sane and factual, and we'll have a conversation. Otherwise I'm finding this conversation VERY high noise to low signal ratio.

I suggest that for your next reply, you drop the sarcasm ("What an incredible UTM score!" ) - its not helpful in any conversation. Also drop the sliding-slope arguments ("javascript uses eval and it sucks, you'll be like javascript!").

I'll say what I said about eval again: eval-with-security.

By default, eval has NO access to ANYTHING.  "eval(readline())" gives you all the same rights and permissions as inlining the readline() code into a private module with no access to anything, including I/O.  You want to allow free access to your filesystem? Pass the rights in.  You want to allow access to your keystore, or SQL/DB?  Pass the rights in.

eval will allow consumption of CPU resources, which can be managed using existing threading controls (timeouts, kills, priorities).  Permission to spawn new threads has to be passed in.

eval will allow consumption of memory, although I'm likely to (eventually) demand a default amount of memory resources to allow - and/or require the compiler can prove finite memory consumption.

eval allows loading of new code, and as seen in the Java land, the new code can run as fast as the old code (which is roughly the same speed as "gcc -O2" compiled C code).  I don't like the Java "class loading" mechanism, so I'm thinking of using eval for this purpose.  Macros have their own evils, so again eval gives the same kind of capabilities.

"Rings" of security with ladder logic stacking heading towards everymore complexity?  Again a sliding-slope argument.  Not gonna happen anyways.  Get it right once, at the start, and done.  No ladder logic, no complexity, and enough rope to hang yourself.

Cliff

On 9/9/2021 12:14 AM, Jim Northrup wrote:

this still seems like its in the zipcode of dogfooding but not at the front door...

javascript shows what a language that uses eval as a first class feature can become over time. this becomes a main IO conduit, and has a been a cat and mouse arms-race. so presently we have CORS and JSON as a resultant discipline of web 1.0 sql-injection paradigms.

If javascript had a stdio there would be an enormous amount of code on the web simply doing |eval(readline())| on stdin. What an incredible UTM score! Agents with all capabilities of the language from 2 functions.

iiuc not only is eval slated to be reflection, but you said it works nicely as a macro language. we also see this in javascript in modern contexts such as when installing a swagger API definition coinciding with a swagger library.

c++ TMP coexists with macros, and is relatively similar, though c++ lacks every vestige of runtime eval made and does so by type-agnostic decisions. nonetheless there is a complete raytracer executed in compile-time c++ template evaluation because "why not?".

it is just an observation on my part that you can probably gain the best of both extremes by implementing a ladder logic of sorts to install a ring-0, and successive "security" rings, enabling optimal baked eval on rings higher up which are presumed to exist in modern contexts with access-denied, read-only and keypair-only accesses. this could really improve a lot of agent code in the wild presently under various if only the right mix of engineers were in one place at the beginning of the design discussion about telephone switching networks and applets and programming languages and whatnot.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/cliffclick/aa/issues/14#issuecomment-915826228, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAL3JYHSRRRSNP5IDRDPIMLUBBNENANCNFSM5DQ7XERQ. Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

jnorthrup commented 2 years ago

I suggest that for your next reply, you drop the sarcasm ("What an incredible UTM score!" ) - its not helpful in any conversation. Also drop the sliding-slope arguments ("javascript uses eval and it sucks, you'll be like javascript!").

Dropped and dropped, having intended neither impression as stated above. I shouldn't have to explain or defend attempts at levity nor objectivity. In this matter "happy to get some help" seemed to invite candor and dialogue. dropped, most sincerely.