JarrettBillingsley / Croc

Croc is an extensible extension language in the vein of Lua, which also wishes it were a standalone language. Also it's fun.
http://www.croc-lang.org
79 stars 12 forks source link

Class system overhaul #61

Closed JarrettBillingsley closed 11 years ago

JarrettBillingsley commented 11 years ago

I've been thinking about the class system. It's worked okay, but..

To be honest, I don't think I was ever really sold on the whole differential inheritance thing in the first place. Sure, it makes instances smaller at allocation, but is that really a big concern? It imposes additional runtime overhead when reading unset fields, makes it harder to list what fields an instance actually has, and prevents any runtime correctness checking of field assignment (how do you know, when assigning a field, that it's a valid assignment and not a typo? you don't; you always create new fields when assigning fields for the first time). And all of the space benefits likely disappear since you'll probably set most/all of the fields to new values.

Furthermore, how often do you want to add a new field to an instance, or to a class that has already been instantiated? I don't really buy the traditional arguments from dynamic language fanboys. Monkeypatching has its uses, such as adding functionality to libraries whose source you don't control, but doing it at runtime with objects that already exist seems precarious. Croc isn't meant to be great for writing long-running fault-tolerant hot-patchable systems. There are other languages/models that do a much better job of it.

Next, there's the annoying issue of field privacy that I've kinda put off for a while but maybe I should just deal with it now. I want there to be public and private fields, dammit. Private fields also eliminate the need for "extra fields" that have hung around for a while, and "extra bytes" can be handled as just a private memblock field, so both of those can be dropped. And if there are no extra bytes or fields, we can drop allocators too!

So this is what I'm thinking:

struct FieldValue
{
    CrocValue value;
    CrocClass* proto; // the class that declared this field/method
    bool isPublic;    // true for public; false for private
}

alias Hash!(CrocString*, FieldValue) FieldHash;

struct CrocClass
{
    CrocString* name;
    CrocClass* parent;
    bool frozen;             // whether or not this class has been frozen
    FieldHash methods;       // before freezing, C/R/W/D; after, read-only
    FieldHash fields;        // before freezing, C/R/W/D; after, R/W
    CrocFunction* finalizer; // before freezing, R/W; after, read-only
}

struct CrocInstance
{
    CrocClass* parent;
    FieldHash fields; // data is allocated directly after this instance
}

Have a new "object" stdlib for OO-related stuff. Some of it moves out of the baselib.

object
    fieldsOf(class|instance)           // iterator, like allFieldsOf is now
    methodsOf(class|instance)          // similar to above
    rawGetField(instance, string)      // same
    rawSetField(instance, string, any) // same
    Finalizable(class)                 // same
    newMethod(class, string, function) // adds new method to a class
    newField(class, string, any)       // adds new field to a class
    remove(class, string)              // removes method/field from a class
    freeze(class)                      // freezes a class
    isFrozen(class)                    // tells whether a class is frozen
ligustah commented 11 years ago

Sounds like a major change, but a good one, I think.

May I ask why you do this?

Any identifiers that appear inside a class in field access position that are prepended with an underscore are
automatically prepended with the class's name. That is, something like _x becomes Classname_x. Also happens to any 
fields/methods with names like that. They will be set as private.

And why do you separate fields from methods? I mean methods are just callable fields after all.

JarrettBillingsley commented 11 years ago

As for the private field naming: this is to avoid name conflicts in derived classes. For instance, suppose you had:

class Base
{
    _x = 5
    function doSomething() { /* expects _x to be an int */ }
}

class Derived : Base
{
    _x = "hello"
    function doSomethingElse() { /* expects _x to be a string */ }
}

These two classes might not know anything about one another, and they both have private fields named _x. If you simply had Derived overwrite _x with its own value, suddenly you break the doSomething method that it inherited. If you throw an error upon declaring Derived, saying that the field _x already exists, then you have to rename Derived._x so that the name no longer conflicts (and this can get annoying if you add a private field to Base and several classes which derive from it now all suddenly have name conflicts).

By prepending private fields with the name of the class they were defined in -- and also doing so when accessing fields inside classes -- all this stuff is avoided. It can break if you have a derived class with the exact same name as its base, but why the hell would you do that?

As for separating fields from methods: now that instances copy their fields from their class, putting all the methods into every single class instance becomes a huge waste of memory. Suppose you have a class with 10 fields and 20 methods. If you just copy the non-method members, each instance will have a hash of 16 slots embedded in it; if you copy both, each instance will have a hash of 32 slots. That's double the memory consumption, and it's pointless waste when you consider that probably 99.99% of the time, you never change the methods on a per-instance basis. You might as well just look them up in the class instead.

ligustah commented 11 years ago

Ah, I see. Thanks.

JarrettBillingsley commented 11 years ago

writeln$ (\-> :NativeStream_stream)(with console.stdin.getStream())

welp

ligustah commented 11 years ago

Doesn't look quite cool oO Am 31.01.2013 08:41 schrieb "Jarrett Billingsley" <notifications@github.com

:

writeln$ (-> :NativeStream_stream)(with console.stdin.getStream())

welp

— Reply to this email directly or view it on GitHubhttps://github.com/JarrettBillingsley/Croc/issues/61#issuecomment-12931195.

JarrettBillingsley commented 11 years ago

Nope. It means that privacy means nothing. Either I remove 'with' entirely (which sucks because it has lots of legitimate uses) or I have to figure out another way of doing field privacy.

ligustah commented 11 years ago

Or you leave it like that because it looks pretty hacky anyways, and you can do similar stuff in java with reflexion as well to get private fields. I thought private fields were supposed to be more of a hint to the programmer, like "you better not touch this, dude, unless you really really know what the hell you are doing".

JarrettBillingsley commented 11 years ago

Except you can use this to crash the host

    local p = os.Process()
    p.execute("dmd")
    local s = (\-> :NativeStream_stream)(with console.stdin.getStream());
    (\{ :Process_process = s })(with p)
    p.wait()

Oh fun.

JarrettBillingsley commented 11 years ago

This goes directly counter to one of Croc's goals, which is that script code should never be able to crash the host without explicitly being granted the ability to do so.

ligustah commented 11 years ago

Thought the os lib is unsafe anyways?

JarrettBillingsley commented 11 years ago

The problem is in general; you can just screw around with internal implementations of any native-exposed class and wreck havok. You could overwrite a memblock that held a pointer to C-allocated memory (like in pcre.Regex) and cause a segfault.

ligustah commented 11 years ago

I see, well then that's a problem indeed

ligustah commented 11 years ago

Can't you make private fields work more like local variables? Those are basically inaccessable by dynamic code as well, aren't they? Does the runtime need to know the names of private fields? Can't the compiler take care of that?

JarrettBillingsley commented 11 years ago

Well I've been thinking, and rather than trying to change how the field privacy mechanism works (which is.. daunting, to say the least), I could just patch the hole at the problem: with.

The with construct was introduced so you could do fun stuff like making a wrapper function around a class method, forwarding the call to the method without it even knowing something happened. Since there didn't use to be any field privacy, the 'proto' was only ever used for supercalls, so it didn't matter that f(with o) set the proto to o.super. But now, that's a gaping hole. The entire basis of the field privacy mechanism is that you can only access private fields from within methods that were defined in the same class as the field. with doesn't respect this at all, allowing you to change the proto of ANY function call to whatever you want.

The fix is to remove with and replace it with a library function that restricts you to changing the context of functions in a legal way: you can only change the context of methods, and the context can only be changed to a subclass of the method's owner (proto). This still lets you do neat tricks with method wrapping, but closes the privacy hole.

JarrettBillingsley commented 11 years ago

Or maybe with can stay but its proto-setting behavior would be stripped. It does have some useful behaviors here and there that don't have anything to do with methods (like iterator functions).

ligustah commented 11 years ago

I might be getting all this wrong, but couldn't I just subclass some native class and screw around with its private fields then?

JarrettBillingsley commented 11 years ago

Nope. You can only access private fields from within methods that were defined in the same class that the private fields were. When you do class Derived : Base { function foo(){} }, foo belongs to Derived, not Base, so it can't access any of Base' s private fields (even if it knows the names).

ligustah commented 11 years ago

Then I guess modifying this sounds like a feasable solution. I'll just come back complaining if my code stops working due to that change ;D