Mellow-Programming-Language / Mellow

The Mellow Programming Language
MIT License
7 stars 2 forks source link

[Post-GC] Channels must be allocated on a shared heap and, consequently, ref-counted #70

Open CollinReeser opened 8 years ago

CollinReeser commented 8 years ago

Once the per-green-thread-garbage-collector implementation is in place, which includes the deep-copy, deep-compare, deep-GC-mark, deep-GC-pointer-update, etc. compilers, all data will be owned only by a single thread, such that any data passing to a different thread will be a copy of the data passed.

The one exception to this rule is that channels must be shared between threads, by virtue of what they do. That means there must be a shared heap that the overall runtime keeps track of, on which the channel objects will be allocated, so that multiple threads can own the same exact allocated channel object.

Since each GC is per-thread, that paradigm does not easily map to the concept of a shared heap. The solution is to implement infrastructure for a ref-counting system, where the ref-count is updated only when ref-counted data is either transferred to a new thread through a spawn or channel accesses, or a thread owning the ref-counted data dies. This greatly minimizes the number of points at which the ref-count of an object must be updated so that performance is not impacted in any considerable way. Each thread will keep track of the ref-counted data it has a reference to, through checks compiled in by the compiler based on dealing with a ref-counted object type at those previously-listed interaction sites.

Once all threads exit which held a reference to a channel, the channel can then be deallocated from the shared heap.

An open question is how to handle valid data sitting in the channel (which has not yet been read by the target thread, and therefore has not been populated in the target thread's GC object). This case may require a deep-free compiler, that descends a type and recursively frees the object from the bottom up. Once the data is in the channel, the source thread has disowned the data, and if the data wasn't read by the target thread, nothing has access to the data but the channel itself.

CollinReeser commented 8 years ago

New idea:

If we introduce a refcount in the second eight bytes that is only incremented when the object passes through a channel to another thread, or which is passed as an argument to a spawn(), and which is decremented as part of the sweep process of the mark-sweep (a bit in the second eight bytes, probably next to the mark bit, will indicate whether an object is refcounted), and which upon hitting zero frees the memory, and when it does not hit zero is simply skipped in the sweep, we can efficiently and with little effort provide guaranteed, eventual cleanup of channels that go out of scope in all threads. This principle can be extended to clean up extern struct objects, such that just before free'ing the memory upon hitting zero, we call a destructor function placed in the third eight bytes of the extern struct allocation which was set on creation, which can be in charge of, for example, closing files or network sockets. These destructors should be able to be called more than once, in case the programmer does want to manually close things at a deterministic point in their code.

Perhaps something like:

extern struct File(fclose); or, perhaps more generally:

struct MyStruct {
    a: int;
    b: string;
    destructor: myDestructorFunc
}

wherein destructor is a special keyword that now informs the code generator and GC runtime that this struct has a destructor that must be called before it's free. We have no need for anything other than channels and extern structs to be refcounted, so until a reasonable reason arrives, there will be no provided way to force a refcounted struct allocation.