Mellow-Programming-Language / Mellow

The Mellow Programming Language
MIT License
7 stars 2 forks source link

[GC, Hash-Type, Set-Type] Implement deep-algorithm "sub-compilers" to generate recursive code for each type #71

Open CollinReeser opened 9 years ago

CollinReeser commented 9 years ago

At present it appears we need deep-copy, deep-mark, deep-pointer-grab, deep-compare, deep-compare-ordering, and deep-hashing algorithms. Each of these algorithms are structured very similarly, but perform a different job.

Deep-mark is the most directly relevant to GCing, as it is the algorithm that descends an object of a particular type, sets that object's own mark bit, and then recursively descends on any and all members of that object which have not yet had their mark bit set. This will set all of the mark bits of every object in an arbitrary datastructure, such that the sweep stage of GC can then free all objects without their mark bit set.

Deep-pointer-grab descends an arbitrary datastructure that is believed to not belong to any thread GC yet, and updates the GC object to tell it that it now owns every object in the datastructure.

Deep-copy is likely the trickiest, as it seems the algorithm both needs to descend to the bottom of the object first so that the copy can be made bottom-up, and also it needs to keep track of cycles in the datastructure so that the same object referenced in more than one place in the datastructure is handled properly in the production of the copy. This algorithm is necessary for the future .dup property to force production of a deep copy of an object in mellow code, and this algorithm is necessary to deep copy any object that passes through a channel to another thread, so that no thread has reference access to data that any other thread has access to, excepting the channels themselves.

Deep-compare is necessary for set and hash operations. Only strings and basic data-types will be compared, though all objects should be reducable to those elements given enough recursion into the object type.

Deep-hashing is necessary for hashing operations. Similarly to deep-compare, ultimately the hashing operation is reduced to interacting with strings and the basic data-types.

To produce these algorithms, each unique type, down to how a templated type is instantiated, must have an associated, compiled function that performs each of those unique tasks. With the GC runtime, all objects will have a 16-byte object header, wherein the first bit is the mark bit, several more bits are reserved, and the remainder of the first 8 bytes of the header will be used to contain the type's integral identifier, for use during runtime, including these algorithms. In particular, x86 jump tables will be produced, so that for any type, it is only two instructions to lea the address of the relevant type's algorithm, and a jmp to jump to it.

Essentially, "sub-compilers" will need to be implemented that can implement each of these algorithms for every type discovered within a mellow program. Each type will need to be collected, and a list of the unique types passed along to the sub-compilers to produce the relevant functions for each algorithm for each type.

Some types will need to have a "canned" type identifier, such as strings, channels (a deep copy of a channel is simply a copy of the channel itself, set explicitly to empty), and one or more Maybe template instantiations depending on usage within the standard library, such that when objects of those types are explicitly constructed in C within the runtime or the stdlib, their proper integral type identifier can be set.

CollinReeser commented 8 years ago

The hashing and comparison functions shall no longer be deep algorithms, but shall stay to the first level in a hierarchy. For an arbitrary object, such as a struct, the hash for that object will be the multiplication of each sub-hash, where each toplevel element in the object is hashed as follows: Primitive data types are hashed directly. Strings are hashed using the string hashing function. Any and all reference types are hashed as though their representative pointers were integral values. Therefore, no recursive descent takes place.

The comparison functions are similar: Direct comparison of primitive datatypes and strings, and direct comparison of the integral values of the pointers of any reference types.