Cleaning House - Githubissues

@dshadowwolf: You know, parts of this repository almost feel like they're making it try to be a language tutorial, a language reference, a language specification, and a language 'evolution' repository (a la Swift Evolution) all at once! You should thus probably figure out how to split it up into multiple repositories containing things along those lines, though I can imagine it might be annoying to have all of said repos loosely clustered under your GitHub user repository namespace…

@dshadowwolf: …which led me to filing #11, of course.

hrm... perhaps, yes - I have some more to write based on recent experience with Java and some thoughts about other methods to achieve some of the design goals

I'll be doing that soon enough - its mostly some thoughts on non-OO interfacing and means of achieving the goals without having to go full OO

@dshadowwolf:

You seem to have been working with Java rather a lot from what I've heard from you both a while back and as of late. If I remember correctly, I know you have a certain amount of experience with other programming languages as well, but you can understand my concerns on this matter, I hope… — I mean, it's never good to clone parts of an existing programming language when creating a new one unless you're going for full interoperability with it in the first place. Java already has Kotlin for that, though, so it would probably make more sense to focus on making it easy for users of this programming language to interface with low-level systems programming languages and ones built on top of or mirroring certain traits of them, at least to start with. Some examples of these would be C, C++, Objective-C, Objective-C++ — yes, I know you've expressed some distaste at this creole's existence, but said patois does make it easier for Objective-C to interoperate with C++ than it would have been able to if it had been forced to go solely through C shims, so there is that… —, and Swift (these choices for examples probably show a bias reflecting my background in Apple's Darwin platforms despite not really ever getting my hands that dirty with them, but, hey, that's what I could come up with off the top of my head) — without copying every design decision those languages and the platforms built around them have made, of course!
Some of the examples I've put into Possible Future Research Directions.md probably have the potential to help you avoid these kinds of traps, but only time will tell whether they really helped or not. Returning to the topic of non–object-oriented interface presentation, I don't really have anything to offer from the perspective of Java despite having read a book on the language quite some time ago (it was an older version, and I didn't do most of the exercises for that book, either, except the puzzles, and I no longer have it on hand since I borrowed it from my local library and thus had to return it.) What I have had experience with lately, as you might have gotten the sense while I was still #tsa-tech with you and its other regulars, is C++, what with reading and re-reading the two textbooks/reference texts on that language I do own and keeping track of current efforts in that language's standardization for the past little while. When it comes to preliminary recommendations as to what you might be able to take from that sector of applied informatics, I can begin by giving the following references:

'Inheritance Is The Base Class of Evil', a presentation by Sean Parent
The link to cppreference's link to the working paper for the technical specification on 'C++ Extensions for Concepts,' a significant portion of which was recently merged into the working paper for C++20 by the ISO C++ committee
Eric Niebler's ranges-v3 library, a proof-of-concept implementation of the C++ Ranges TS implemented so as to work outside of GCC, the only compiler which currently supports C++ Concepts as an experimental feature
Andy Prowl's GitHub repository containing a WIP paper on run-time/virtual concepts
Louis Dionne's 'dyno' library
Herb Sutter's recent paper on 'Metaclasses' (his companion ACCU 2017 presentation, which closed out the conference, wasn't out on YouTube last time I checked)

You may have even seen some of this already, and I may have even mentioned at least one of the items I just listed before, but I thought it might be best to officially document these references within this issue here on GitHub instead of just leaving them floating somewhere more perishable in the ether, so to speak.

P. S.: On another note, maybe this language could make it easy for end users to write logical equivalents to the inter-language translations inherent in C++'s extern "…" linkage specifiers — currently there's only extern "C", of course — as library abstractions to add support for interoperating with whatever language(s) they want…?

P. P. S.: Phew, that was a long one…

P. P. P. S.: And I never really addressed the main point of this issue discussion thread in this reply before now, did I? Oh, well; I do it now: I think splitting this repository up into multiple ones as I suggested would make sense, as it would help separate the various document/code example structures I said I feel you're currently conflating here to some extent. Making cross-repository changes would be a major pain, though, unless you figured out to script it well…

The linkage thing is actually defined on an OS basis - the extern "C" piece actually tells the compiler not to apply standard C++ name-mangling (used for function overloading and encoding of parameter and return-types) to that part of the code.

True linkage standards involve the placement of parameters and return values and how that is formatted - does the caller clear the argument stack or the callee ? That sort of thing. Not only does it differ from OS to OS, it can vary from processor to processor - there are several different ABI's just for ARM code.

As to the rest - after working with Java and C# (mostly Java) I've come to the conclusion that forcing a specific style like that just creates issues where some algorithms from other styles are a lot better. For instance, I'm starting to become a major fan of the functional-style of "don't use side-effects, all variables are immutable" style of things because of how it makes a large number of errors disappear.

I need to finish up some requested fixes for a discord bot I wrote, then I'll get to writing another chunk of text about all the flaws I found in my previous bits here and what some better ideas are.

@dshadowwolf: (Quick question before I get into the meat of this reply: should I really bother with @-mentioning you every time I post to a thread here on your repository since you and I are the only two people here so far and doing so usually isn't really that necessary when one can easily tell from the context that only one conversation is going on who's replying to whom? I'm starting to see why you don't mess with it, anyway, and I'm wondering why I keep going on with it just because I did so in the first place. Anyway, let me back out of this sidebar…)

The linkage thing is actually defined on an OS basis - the extern "C" piece actually tells the compiler not to apply standard C++ name-mangling (used for function overloading and encoding of parameter and return-types) to that part of the code.

True linkage standards involve the placement of parameters and return values and how that is formatted - does the caller clear the argument stack or the callee ? That sort of thing. Not only does it differ from OS to OS, it can vary from processor to processor - there are several different ABI's just for ARM code.

⋮

True, but you could expose compiler or interpreter hooks and/or primitives and/or allow hygienic compile- or interpretation-time macros to invoke side effects — ones that you ensure are documented, though, and hopefully even as code syntax and not as comments or external documentation so the compiler or interpreter can enforce — inside the abstract machine's compile- or interpretation-ime representation of its state. Given something like those kinds of control and customization points, I don't doubt a sufficiently clever programmer could write a library to write their own FFI for this language regardless of its ABI or that of the other programming language to which they want to enable interoperability with. The ABIs in question would still differ between platforms, but they would differ in tandem. I agree that either sufficient manual investment into supporting the X-by-Y-by-Z matrix of architectures, OSes, and FFI source/target languages and/or access to good code generation tools would be needed to make the problem tractable, but I think it would be awesome for an end user to just be able to, bikeshedding aside, write extern "$THEIR_FAVORITE_LANGUAGE_HERE" and let these low-level compiler or interpreter libraries, whether they were provided as part of said compiler or interpreter's packaging or made available as a compiler or interpreter plug-in via an external package (from a package manager or not,) take care of all the details for them. Maybe I'm overlooking something, though…?

⋮

As to the rest - after working with Java and C# (mostly Java) I've come to the conclusion that forcing a specific style like that just creates issues where some algorithms from other styles are a lot better. For instance, I'm starting to become a major fan of the functional-style of "don't use side-effects, all variables are immutable" style of things because of how it makes a large number of errors disappear.

⋮

OK, cool; just checking! Still, side effects and mutable data can, at times, still be useful if used correctly — though, from what I've seen (and I know it's just the tip of the iceberg given my relative lack of experience, don't worry,) correct usage is hard! It would thus probably be a good idea to make use of compiler-enforced side-effect tracing and some kind of thread-aware ownership model based either on explicit programmer annotation of their code or inference of entity attributes by the compiler or interpreter (maybe the latter with the former for ambiguity resolution…?)

⋮

I need to finish up some requested fixes for a discord bot I wrote, then I'll get to writing another chunk of text about all the flaws I found in my previous bits here and what some better ideas are.

Right, the projects that already have users come first. Feel free to take your time; I'll likely still be hanging around.

True, but you could expose compiler or interpreter hooks and/or primitives and/or allow hygienic compile- or interpretation-time macros to invoke side effects — ones that you ensure are documented, though, and hopefully even as code syntax and not as comments or external documentation so the compiler or interpreter can enforce — inside the abstract machine's compile- or interpretation-ime representation of its state. Given something like those kinds of control and customization points, I don't doubt a sufficiently clever programmer could write a library to write their own FFI for this language regardless of its ABI or that of the other programming language to which they want to enable interoperability with.

That way lies madness - a language should have a defined system for the names of the exposed parts of the compilation units and the ABI is defined by the target machine and OS. In the case of Java, the target machine and OS are the same - the JVM - while, in the case of C++, the values can be wildly different. Yet both have a very well defined standard for internal representations of "exposed names" of the compilation units (in Java they are, basically, just unicode strings and in C++ there are various, complex mangling systems that add lots of information that would otherwise be lost). The ABI exists to allow for interoperability - in the bad old days every language, basically, had its own ABI and naming standards - sometimes versions of the same language from different manufacturers differed.

Still, side effects and mutable data can, at times, still be useful if used correctly — though, from what I've seen (and I know it's just the tip of the iceberg given my relative lack of experience, don't worry,) correct usage is hard!

Just because I'm a fan doesn't mean I'd force it. At this point I've got some thoughts on making the language somewhat Lisp-like in that it'd have a theory of how things work and would have all the things needed to write procedural, functional or Object Oriented code.

That isn't to say that forcing one of those paradigms isn't a good idea as it cuts down complexity, but with language inter-operability being what it is (even in the land of the JVM!) offering multiple paradigms in one language means that you can write the code in the manner that works best for each algorithm and not be forced into crappy situations.

@dshadowwolf:

(snipped…)

That way lies madness - a language should have a defined system for the names of the exposed parts of the compilation units and the ABI is defined by the target machine and OS. In the case of Java, the target machine and OS are the same - the JVM - while, in the case of C++, the values can be wildly different. Yet both have a very well defined standard for internal representations of "exposed names" of the compilation units (in Java they are, basically, just unicode strings and in C++ there are various, complex mangling systems that add lots of information that would otherwise be lost). The ABI exists to allow for interoperability - in the bad old days every language, basically, had its own ABI and naming standards - sometimes versions of the same language from different manufacturers differed.

⋮

     Sorry, I was just throwing ideas at the wall to see if anything stuck. The point which I was trying to make with the thoughts to which this portion of your reply is reacting was, in retrospect, rather entangled with an idea I've posited implicitly to myself but not as explicitly to you as to how this language's compiler or interpreter might be structured. My thought process involved some recollections about the Glasgow Haskell Compiler, which is, from either what I heard out of a presentation given by a Haskell evangelist at this year's C++Now conference or something I read a while back, built in a layered fashion: compilation using GHC runs your source code through a pipeline of passes, each of which transforms it into a lower-level representation of said code and its library dependencies until it becomes very minimalist, at which point it is then turned into assembly for direct use by your computer's processor. An example more relevant to systems programming would comprise LLVM and its internal representation, but only (or at least mostly) optimizations are run on LLVM IR (though the same may be true for GHC IR as well, as I'm not really all that well versed in either, but I think that that internal representation is manipulated to remove levels of abstraction, also, which is more along the train of thought my brain was riding.)
     So, the (still entirely uninvestigated, but apparently plausible on the surface of things,) logical consequence of such a line of attack would give the language as minimalist of a core as possible and have more features be built on top of said kernel of functionality. Mistaken though I might be, I've gotten the impression that languages that follow this sort of implementation strategy are, as I understand it, easy to extend with new syntax and semantics in the future, as it could initially be developed as a module external to the language's core and then migrated in later as a submodule if desired. The way I see it, FFIs would fit into the stack somewhere, but, inexperienced as I am at programming, I'm not exactly sure where. Maybe adding more modules into a language's core module as submodules would affect its API, and thus its ABI, too much between releases for this kind of approach to be feasible to use in implementing a useful multi-paradigm language.
     Perhaps allowing such a large stack of dependencies to grow inside the language, would lead, as you say, to madness. Management of multiple interdependent internal APIs, and thus ABIs (however few at first,) might eventually become an intractable problem (that is, as more get added in,) leading maintainers and contributors to undertake refactoring of the language's internal constructs. One wouldn't want to stack too many layers on top of each other, either, and would also want to make as many layers share as many implementation strategies as possible without forcing programmer gymnastics. Still, other languages have done something similar, so I consider the idea worth looking into even if it eventually gets discarded. Modularization of the language compiler or interpreter's internals into multiple library entities would help allowing different individuals to maintain various disparate parts of it independently. Think of it as parallel to the difference between a monolithic kernel versus a microkernel (or even a unikernel with a microkernel built on top of it,) just applied to programming language design instead (though, admittedly, I've never even gotten anywhere close to writing anything as complex as a kernel, either!) One could then plug different ISA and OS ABI modules exporting platform-dependent details into a language's implementation while building it or while having it switch to targeting a different machine architecture or OS for cross-compiling instead of having to rewrite the entire system from scratch for every new platform one would want the language to support. The ABI of the resulting system would presumably, after the era of instability prior to v1.0, then remain relatively stable on each of the language's supported OSes and chip architectures.

⋮

(snipped…)

Just because I'm a fan doesn't mean I'd force it. …

I hear you loud and clear on that point by now.

…At this point I've got some thoughts on making the language somewhat Lisp-like in that it'd have a theory of how things work and would have all the things needed to write procedural, functional or Object Oriented code.

⋮

…or, for that matter, any other paradigm you might want the language to support? (Is it safe for me to assume you just picked those three examples out of a hat, as the saying goes…?)

⋮

That isn't to say that forcing one of those paradigms isn't a good idea as it cuts down complexity, but with language inter-operability being what it is (even in the land of the JVM!) offering multiple paradigms in one language means that you can write the code in the manner that works best for each algorithm and not be forced into crappy situations.

Like I've heard elsewhere, programming-language and library design are extensively about balancing trade-offs. You seem to have a better handle on what most of those are than I do, and this is your show, after all…

The API layering and possible shifting into the core of the language spec is no issue - it's the ABI layering that becomes an issue. The OS defines how the arguments are passed around, modified by how the underlying machine works - that is the ABI, outside of things like a.out, elf or cout formats used for the storage of the code itself, whether a library or executable.

…or, for that matter, any other paradigm you might want the language to support? (Is it safe for me to assume you just picked those three examples out of a hat, as the saying goes…?)

I picked those three because they are the most common forms - there are a couple others, but I've never run into them outside of an experimental or teaching situation.

Like I've heard elsewhere, programming-language and library design are extensively about balancing trade-offs. You seem to have a better handle on what most of those are than I do, and this is your show, after all…

Yeah, which is why I'm thinking about the design I mentioned, where it encompasses a broad theory and doesn't force any single school of design.

The API layering and possible shifting into the core of the language spec is no issue - it's the ABI layering that becomes an issue. …

OK, I think I may be starting to see what you're getting at with that last part, but you wouldn't happen to have any concrete examples as to situations involving unmaintainable ABI layering, would you? Otherwise, I guess I'll just take your word for it.

…The OS defines how the arguments are passed around, modified by how the underlying machine works - that is the ABI, outside of things like a.out, elf or out formats used for the storage of the code itself, whether a library or executable.

⋮

Right. And then you have Mach-O… — but, like you say, that's only of tangential relevance since its some amount of layers of abstraction above the raw instruction stream received by a processor.

⋮

(snipped…)

I picked those three because they are the most common forms - there are a couple others, but I've never run into them outside of an experimental or teaching situation.

⋮

Sure, if you count imperative and declarative programming as styles of procedural programming.

⋮

(snipped…)

Yeah, which is why I'm thinking about the design I mentioned, where it encompasses a broad theory and doesn't force any single school of design.

I'll look forward to seeing how you put down your newer thoughts, then.

I have an actual spec on it in the works, which will be posted ready to accept edits and ideas from others. The basic concept of it is a layered system where the actual programming paradigm is just one of those layers and what the user actually sees is set above it in the design.

It is very rough and needs a lot of work because I am trying to keep the actual design generic and non-specific. Though it includes something like an intermediate-code at the lowest level and at the paradigm level it includes a DSL for specifying parser extra's and AST transforms.

You'd know better than me, but, yeah, that sounds about right in context. I don't know how much I'll be able to help, what with other rabbit holes are sucking me into them and all, but I'll see what I can do.

dshadowwolf / ideas-for-a-new-language

Cleaning House #10