c3d / xl

A minimalist, general-purpose programming language based on meta-programming and parse tree rewrites
GNU General Public License v3.0
270 stars 15 forks source link

Generate lifetime values #15

Open c3d opened 4 years ago

c3d commented 4 years ago

Lifetime is a new concept in XL, inspired by the Rust "borrow checker", and destined to make it possible in XL to implement a wider class of borrow-checks from the library.

A lifetime value is generated for the expression lifetime X, which returns a compile-time constant that can be compared to other lifetime constants. For instance, a reference type can be made from an owning value with the following condition:

type ref T
type own T

Source:own T as Target:ref T when lifetime Target < lifetime Source is ...
Source:own T as Target:ref T when not (lifetime Target < lifetime Source) is
    compile_error "The reference %1 might outlive the source %2", Target, Source
dumblob commented 3 years ago

How about combining this with more coarse grained memory management?

Seeing how Tao3D "concepts" nicely composed in the presentations app etc., what about making memory management explicit by providing a pattern memblock akin to locally?

I mean, memory is not infinite and most data in it actually have a lifetime. Garbage collectors just hide this fact. Why not to require the programmer think about memory also? Sure, not about every single tiny malloc(), that's an extreme exaggeration. But with much more coarse grained chunks it becomes mentally manageable and actually one can always be more precise with memory management if she wants.

Leaving memblock scope would call free() on all captured references and entering the scope would just create a checkpoint to which every malloc() underneath recursively would register.

As a side effect, this could at the same time work as a "built-in wrapper over malloc()" to ensure low pressure on the system malloc().

One wouldn't need to care about loops with low number of iterations and smaller allocations. For other cases, there is nothing easier than just prepending the loop body with memblock :wink:.

There are definitely things I didn't consider as this is in no way thought out proposal. But I wanted to let others know.

This idea naturally builds upon the assumption that references will not escape the context which I don't know how to ensure in XL, but maybe there are some ways.

dumblob commented 3 years ago

After short experimentation I found out the memblock would be very annoying (I've tried to write some sketch code with that in mind and it turned out horrible).

I turns out, the main problem is that people tend to think about malloc()ked memory the same as about stack memory. The key difference is though, that stack memory doesn't have to be managed by hand at all, so it's actually pure. But malloc()ked memory is impure (an "effect" in Scala terms). This distinction shall be also explicit when programming.

In other words, for stack values, it's just one layer (there is no "meta" layer) - one just writes a variable name and all is done automatically - thus one cares only about the value itself and not the "existence" of the value. With heap values, it's two layers (there is a "meta" layer taking care of existence of the value and then the "main" layer is the value itself) - one first interacts with the "meta" layer in the context of the variable and then finally interacts with the value itself and then needs to interact again with the "meta" layer every single time the value shall change it's meta property size and finally on top of that one needs to also interact with the "meta" layer after the value of the variable is not being useful any more.

In yet other words, the culprit seems to be the "assignment" operation in most languages which under the hood does very different things (often pretty convoluted - based on types, context, situation, multithreading/processing, heuristic, etc. the rval is being shallow copied, weakly linked, deep copied, moved, borrowed, etc.) thus totally blurring our perception of the huge difference between stack and heap memory.

My proposal would then be to treat malloc()ked variables the same as stack variables when it comes to their lifetime (i.e. discard when leaving the lexical scope they appeared in). While at the same time make a strict syntactical difference between copying rval to lval and hardlinking rval to lval (thus the ambiguous "assignment" wouldn't exist). copying will be used much more for stack values (because their sizes often fit into HW registers and thus copying them is much faster than "refering" to them) and hardlinking for heap values (because they usually do not fit into HW registers and thus copying them is much slower than "refering" to them).

I've spend now quite some time writing down how it could work, but then I realized I'm more or less describing Nim's ORC and Lobster's GC. Both actually inserting free() automatically at right places in compile time based on an advanced flow analysis. The rest, which is collecting cyclic dependencies (which seems impossible to do in compile time), is being done in runtime. Performance is for vast majority of data structures identical to a manually written C code. Though structures heavy on cyclic references might be slower, but only by a small factor like 0.95x.

Back to the topic - I actually think that lifetime values are important (maybe not necessary though) to make the advanced flow analysis more effective and thus I see lifetime values as on optimization technique.

dumblob commented 3 years ago

One thing I forgot to mention - because management of heap memory can be seen as manual "cache" management, the same ORC-like garbage collection can be used also for seamless persistent values. Thus providing a seamless and very easy way to make the application state persistent (resumable) unlike most other languages.

This could be a killer feature with commercial potential. Imagine e.g. https://scattered-thoughts.net/writing/imp-intro/ but actually practically usable.

c3d commented 3 years ago

@dumblob Thanks a lot for the thoughts.

I cannot address all of it in a timely fashion, so I'll try to focus on a few essential aspects. Feel free to resteer the discussion if you feel I forgot something 😃.

First, lifetime is a note to myself regarding a vague idea on how to implement a Rust-style borrow checker using the existing features of XL. As you may have noticed, a fundamental idea of XL is that very little is in the core language, and that most of the actual language logic is implemented as libraries. Rust was one of the few languages that proposed a new idea that I did not know how to fit. The reason is that the borrow checker cannot be made to work without a lifetime computed at compile-time by the compiler. So I opened this issue as a reminder that if I have a lifetime type with the right properties, I believe that I can define ownership rules that match what Rust offers. The proof will be when someone actually writes it 😏

Second, your observations about memory allocation being a side effect that is not stack-ordered is perfectly valid. This has been a problem for a very long time, with solution ranging from manual management (e.g. C) to automated garbage collection (e.g. Lisp) to static type-based deallocation (e.g. Ada) to dynamic constructors/destructors (e.g. in C++) to the borrow checker (e.g. Rust), I believe roughly in chronological order. All these have advantages and drawbacks, and most exist in multiple variants. I don't know how many garbage collectors have been invented, but I suspect it is more than I suspect.

The XL approach wants to be extensible. This includes garbage collection, for example. Basic garbage collection algorithms are relatively simple, e.g. mark and sweep, but they are also quite generic. Why garbage collect memory and not files or threads or keys or whatever you want? The locally you used as an example is not just a way to manage graphics resources in Tao3D, it also has semantics related to the partial re-evaluation of what is inside.

In short, what you said about memblock could be true for thread locks, where both a stacked and unordered variant make sense, for control flow, where both a stacked (structured programming) and unordered variant (goto) make sense, and so on. In all cases, extremely complicated problems, and the game is to find what are the simplest primitives that lets you build the rest. For control flow, if you have "goto" you can build the rest, for example.

dumblob commented 3 years ago

Thanks for your thoughts!

In all cases, extremely complicated problems, and the game is to find what are the simplest primitives that lets you build the rest. For control flow, if you have "goto" you can build the rest, for example.

Yep.

Just yesterday I saw what devs of Passerine try to use - they chose a bit different approach to the good old idea "deep copy everything everywhere and only if there is an appropriate guarantee, pass it by reference". Usually this is being evaded by combining e.g. flow analysis with some GC technique. But Passerine guys do it by restricting the language and hope their rules are so little restrictive that people almost won't notice. They call it "vaporization".

https://slightknack.dev/blog/vaporization/

I actually think this might be yet another approach for XL - but compared to others, it seems much easier to implement than lifetime or even simple garbage collection. Passerine is also highly versatile language ("almost" like XL :wink:), so maybe their "vaporization" is something to look at before one spends hundreds of hours implementing some more complex memory management scheme.