Standardize on an object or at least polymorphism model

flux-framework / flux-core

core services for the Flux resource management framework

GNU Lesser General Public License v3.0

167 stars 49 forks source link

Standardize on an object or at least polymorphism model #363

Closed trws closed 2 years ago

trws commented 9 years ago

We have coding standards for a wide variety of cases in the RFCs, but do not have anything on how to define generic interfaces or handle polymorphism of objects. This affects things like the reactor rework, where the interface has become quite clean, but retains several *_start and *_destroy functions for lack of a good way to make a single central version of each. The KVS interface suffers from a similar issue, where put and get are replicated for each potentially useful input/output type. What has made me look at it closer lately is the fact that we have quite a few different implementations of an iterator concept throughout the codebase, now a couple more implemented by me, but they all provide different incompatible interfaces. This issue is in hopes of choosing a common mechanism for implementing polymorphic interfaces in the flux codebase (at least if anyone else wants one).

Options:

1) The probably unpopular and definitely unlikely option: Just to get this out of the way, the most straightforward, implementation-fast and user-fast option would be to use a C-interface compatible programming language that has these concepts designed in. I actually wouldn't be against this, for the record. The contenders for this would be, in my mind and roughly in order of how sane they would be for this project:

C++ (much easier to write faster, safe code than C, also much easier to write truly eye-bleedingly awful code)
Go (really easy, really nice, calling C is expensive)
Rust (no runtime, no-overhead, C-compatible ABI, Go-like interfaces as "Traits")
Objective-C (proper superset of C99, dynamic binding, slower than C++ and well-written C)

Now that that's out of the way...

2) Pull in a well-established object system designed for C that has an interface concept and use it. This has the advantage of "standing on the shoulders of giants," as it were, while still being all C. Usually the downside is that the syntax ends up relying heavily on macros, another preprocessor, or a list of conventions so long as to be stifling, but they are a practical option with some good lineage.

GObject: Probably the most battle-tested object oriented C library around, with a very large and active support base and huge well optimized standard library (glib). The whole thing is C99 or lower, with dynamic binding and high-level polymorphism, to the point that anything written as a GObject class can be accessed without any further code from either perl or python. The downside is that the boilerplate is... well, it's significant. There are pre-processors and even languages (see vala) to reduce this, but to put it in perspective, the empty class header example on their website is 119 lines long.
ooc: Less boiler-plate than GObject and still all C, supports inheritance and interfaces with 100% static dispatch except for interface lookup (that last is a bit like go, but less flexible), but requires strict adherence to convention for interfaces to be handled correctly.
Dynace: Object-oriented extension to C, but to be clean requires use of its pre-processor.
cos paper: The C Object System, inspired by CLOS and Objective-C. Much lighter weight than either of the others (~7000 SLoC), handles dynamic method dispatch and forwarding, multimethods and all kinds of goodies with relatively clean interface and object definitions, but is heavily reliant on the C preprocessor. I rather like this one actually, no new deps but a very powerful object model.

3) Write something ourselves: Get exactly what we want, at the cost of having to write and optimize it ourselves. Going this route could let us define just the interface features we want without pulling in object orientation baggage, but it's a non-trivial amount of work.

4) Rely on conventions. If we have strong enough conventions, they'll get the job done for interfaces in the codebase. The downside is that conventions != polymorphism, so unless a standard convention for method lookup or structure-of-function-pointers is set up it won't naturally support module import semantics (#176) as we've discussed.

5) Ignore the whole thing and pretend it never happened.

External references:

Object oriented programming in ANSI C book pdf: this is a great reference, but amusingly resorts to writing an awk-based pre-processor to make the process of writing object oriented C easier. In every way it uses classic ANSI-C by the time you reach the compiler, but the author makes a good point that enforcing the various invariants just by relying on convention is dangerous.

trws commented 9 years ago

In reflecting on this, I feel like I should qualify what I'm talking about here. I'm not suggesting that we expose any of what's in here outside of flux necessarily, though in the case of GObject for example we may want to for the incidental features it would provide, but rather use it to implement interfaces or inheritance as we find useful for ourselves, and still present a standard C interface to the outside. At the least, an interface implementation concept is what I'm after, even if that maps down to just defining a struct with type data that contains structs of function pointers for each interface, so we can treat our internal types more generally than is currently practical.

garlick commented 9 years ago

Thanks for the excellent overview, as usual @trws.

My initial reaction is that we should stick to straight C, observing principle of least surprise for our would be users and collaborators, internally and externally. That likely precludes most of object system for C options IMHO, though I have not reviewed these yet, and am open to being convinced.

By the way, I had forgotten about zmq rfc 21, which we could potentially fork and modify for our project.

garlick commented 9 years ago

In #362 we're asking for another watcher type. Would it be reasonable for me to do the obvious "vtable" style approach to create a singleflux_watcher_t type in the same PR, and then get some critique in that context that also takes into account the general issue?

trws commented 9 years ago

That sounds sensible.

I've been trying to think of a way to define interfaces that way that would allow us to subset or assign those interfaces, at least statically, to interfaces that require a subset of the full interface offered. I have some ideas on how to do that, but maybe it isn't worth it to pursue something generic. If I'm the only one feeling the pinch on this, then I should probably just drop it.

For the sake of putting it out there, I was playing with just having a way to construct an assigner from one vtable to another that does a completely static assignment of fields. It's not completely generic, since it only allows something akin to downcasting, but it would still make a variety of go-like interfaces possible, and it's type-checked at compile time. Something like this:

  #define FLUX_CAST_STRINGER(SRC) FLUX_CAST (SRC, struct flux_stringer, to_string)

  // Optional typedefs
  typedef char *(flux_stringer_to_string_f)(void *);

  struct flux_stringer
  {
      void *value;
      //should be a vtable struct, inlined to keep this short-ish
      flux_stringer_to_string_f *to_string;
  };

Then use it like this:

  struct str_list_node  {
      struct str_list_node *next;
      char *value;
  };

  struct str_list  {
      struct str_list_node *head;
      struct str_list_node *cursor;
  };
  char *str_list_first (struct str_list *l);
  char *str_list_next (struct str_list *l);
  bool str_list_done (struct str_list *l);
  void str_list_push (struct str_list *l, const char *string);
  char * str_list_to_string (struct str_list *l)  {
      struct str_list_node *walker = l->head;
      char *ret = strdup ("["), *tmp = NULL;
      while (walker) {
          fprintf(stderr,"%s", walker->value);
          tmp = ret;
          if (walker->next)
            asprintf (&ret, "%s%s, ", tmp, walker->value);
          else
              asprintf (&ret, "%s%s]", tmp, walker->value);
          free (tmp);
          walker = walker->next;
      }
      return ret;
  }

  typedef struct str_list_interface  {
      void *value;
      flux_iterator_first_f *first;
      flux_iterator_next_f *next;
      flux_iterator_done_f *done;
      flux_stringer_to_string_f *to_string;
  } str_list_i = {
      .first = (flux_iterator_first_f*)str_list_first,
      .next = (flux_iterator_next_f*)str_list_next,
      .done = (flux_iterator_done_f*)str_list_done,
      .to_string = (flux_stringer_to_string_f*)str_list_to_string
  };

  void print_interface (struct flux_stringer s)  {
      char * string = s.to_string(s.value);
      printf("stringed: %s\n", string);
      free(string);
  }

int main (int argc, char *argv[])
  {
      struct str_list l;
      str_list_push(&l, "stuff");
      str_list_push(&l, "other stuff");
      str_list_push(&l, "15");

      struct flux_stringer s = FLUX_CAST_STRINGER(str_list_i);
      s.value = &l;

      print_interface (s);

      return 0;
  }

Of course there are about a million ways this can be re-arranged, but it is the lowest overhead, most C-ish generic interface style I seem to be able to come up with. That said, I am very much open to suggestions on this one, I keep feeling like I'm missing something obvious. =/

trws commented 9 years ago

So, I think I'm coming around to the idea that we don't need anything as complex as what i was thinking about before. Just a standard way to handle pseudo-inheritence where we need it that is reasonably type-safe. Something like this scheme perhaps? Though without the macros, and probably turning the base version into a standard union rather than a struct containing a union named base, no clue why the author did it that way.

My desire for the rest of it I think actually comes from something else, that being the the general lack of efficient implementations of common containers and interfaces to draw on for internal implementation. We have a number of nice libraries in util, along with lsd and czmq, but they end up being a relatively restricted set of interfaces for some tasks. The zhash and zlist being notable examples. These are nice, convenient types, but they're not particularly efficient or flexible, and using them with non-string values is clumsy at best. The newer zhashx and zlistx containers are more flexible, but we don't depend on a sufficiently recent version of czmq to use them, and even those can't be used as efficiently as a hash table that offers a separation of hash generation and lookup unless we put a wrapper around them.

Perhaps it's just that I'm so used to being able to just pull in efficient non-heap-dependent libraries in C++ that it feels odd not to be able to do it here, so feel free to say if I'm just being needlessly whiny about this or something. Anyway, I'm not sure what the solution to that is, or if there is a general one. Maybe it's just to pull in little micro-libs for what we decide is sufficiently useful, or maybe we could bite the bullet and grab tmalloc and some containers, or bite the bullet and grab glib or something to cover all the bases in one go.

SteVwonder commented 9 years ago

Minor note that relates back to #361, but the czmq library that is "actually" required is new enough to use zhashx and zlistx.

trws commented 9 years ago

Are you sure? I thought I had checked 3 and found them missing, with them added in some peculiar sub release like 3.0.2 or some such.

Sent with Good (www.good.com)

From: Stephen Herbein Sent: Saturday, August 29, 2015 11:23:23 AM To: flux-framework/flux-core Cc: Scogland, Thomas Richard William Subject: Re: [flux-core] Standardize on an object or at least polymorphism model (#363)

Minor note that relates back to #361https://github.com/flux-framework/flux-core/issues/361, but the czmq library that is "actually" required is new enough to use zhashx and zlistx.

— Reply to this email directly or view it on GitHubhttps://github.com/flux-framework/flux-core/issues/363#issuecomment-136021660.

SteVwonder commented 9 years ago

Oh, nice call. They include both zlistx and zhashx starting with czmq 3.0.1. Since travis is already setup to use 3.0.2, i don't think it would affect us that much to require 3.0.1+. But I don't want to bog down this thread with that discussion. Sorry for the distraction.

trws commented 8 years ago

I realize we haven't talked about this in a while, but while at the JOWOG this past week I had rather more time for abstract thought (not staring at a computer) than I have in a while, and for some reason this was on my mind. If I set something up for this, that looks on the outside like (and is) regular function calls, but allows for dynamic polymorphism inside the interface, would that be of interest? We talked about possibly using something like this for module interfaces to do multiply-loaded modules at one point for example. The concept I was playing with was declaring a generic dispatcher and registering methods onto it by type (somewhat like the way CLOS works). A nice side-effect of this is that calls use regular C functions, then dispatch internally based on type or address or whatever we decide to use. It has the effect of making the calls easily bindable to lua, fortran or any C interface while allowing for fun things like inheritence, functional inheritence (mixins), wrappers, forwarding, and any number of other things depending on what we would want. In fact, having an explicit runtime cast a-la go to verify the availability of all necessary methods and populate them into a struct of appropriately named function pointers would also be an option with any of these, but make the implementation slightly more verbose. All of these conform to c99 pedantic with no warnings. The densest one does use a load-time initialization attribute that is technically non-portable, but it (or an equivalent) is implemented in every major compiler (including msvc astonishingly).

If there would be interest, I've been playing with some different options, @garlick, @grondo, would any of these be palatable to you? (I used this go example as a base, amusingly the dense one is the same length as the go version)

Basic, no macro sugar, all C dispatch:

// User visible interface: .h
double area (fob *o);
double perim (fob *o);
extern fu_type_t rect_c[];
extern fu_type_t circle_c[];

#include "include/fu.h"

double area (void* o);
fu_generic_t area_g[] = {{.name = "area"}};
double area (void* o)
{
    double (*fn) (void*) = fu_get_method (area_g, o);
    assert (fn);
    return fn (o);
}

double perim (void* o);
fu_generic_t perim_g[] = {{.name = "perim"}};
double perim (void* o)
{
    double (*fn) (void*) = fu_get_method (perim_g, o);
    assert (fn);
    return fn (o);
}

struct rect {
    double w, h;
};

static double rect_area (struct rect* r)
{
    return r->w * r->h;
}

static double rect_perim (struct rect* r)
{
    return r->w * 2 + 2 * r->h;
}

fu_type_t rect_c[] = {{
    .name = "rect",
    .size = sizeof (struct rect),
    .methods = {{area_g, rect_area}, {perim_g, rect_perim}, {NULL, NULL}},
    .parents = {NULL},
    .init_control = PTHREAD_MUTEX_INITIALIZER,
}};

struct circle {
    double r;
};

static double circle_area (struct circle* c)
{
    return c->r * c->r * 3.14159;
}

static double circle_perim (struct circle* c)
{
    return 3.14159 * c->r * 2;
}
fu_method_t circle_methods[] = {{GET_GENERIC (area), circle_area},
                                {GET_GENERIC (perim), circle_perim},
                                {NULL, NULL}};

fu_type_t circle_c[] = {{FU_TYPE_DEFAULTS.name = "fu_object",
                         .size = sizeof (struct circle),
                         .methods = circle_methods, .parents = NULL}};

void measure (void* g)
{
    printf ("a %lf\n", area (g));
    printf ("p %lf\n", perim (g));
}

int main (int argc, char* argv[])
{
    struct rect* r = fu_new (rect_c);
    r->w = 3;
    r->h = 4;
    struct circle* c = fu_new (circle_c);
    c->r = 5, printf ("yup\n");
    measure (r);
    measure (c);
    return 0;
}

Some macro helpers to reduce verbosity and human error, nothing crazy:

// User visible interface: .h
double area (fob *o);
double perim (fob *o);
DEC_CLASS(rect);
DEC_CLASS(circle);

//internal
#include <stdio.h>
#include <assert.h>

#include "include/fu.h"

double area (void* o);
DEC_GENERIC (double, area, void* o);
DEF_GENERIC (double, area, void* o)
{
    double (*fn) (void*) = fu_get_method (area_g, o);
    assert (fn);
    return fn (o);
}

double perim (void* o);
DEC_GENERIC (double, perim, void* o);
DEF_GENERIC_1 (double, perim, void* o);

struct rect {
    double w, h;
};

static double rect_area (struct rect* r)
{
    return r->w * r->h;
}

static double rect_perim (struct rect* r)
{
    return r->w * 2 + 2 * r->h;
}
fu_method_t rect_methods[] = {{area_g, rect_area},
                              {perim_g, rect_perim},
                              {NULL, NULL}};
DEF_CLASS (rect,
           struct rect,
           NULL,  // no parents
           rect_methods);

struct circle {
    double r;
};

static double circle_area (struct circle* c)
{
    return c->r * c->r * 3.14159;
}

static double circle_perim (struct circle* c)
{
    return 3.14159 * c->r * 2;
}
fu_method_t circle_methods[] = {{GET_GENERIC (area), circle_area},
                                {GET_GENERIC (perim), circle_perim},
                                {NULL, NULL}};

DEF_CLASS (circle,
           struct circle,
           NULL,  // no parents
           circle_methods);

void measure (void* g)
{
    printf ("a %lf\n", area (g));
    printf ("p %lf\n", perim (g));
}

int main (int argc, char* argv[])
{
    struct rect* r = fu_new (rect_c);
    r->w = 3;
    r->h = 4;
    struct circle* c = fu_new (circle_c);
    c->r = 5, printf ("yup\n");
    measure (r);
    measure (c);
    return 0;
}

Dense, macro-heavy, but only internal:

// User visible interface: .h
double area (fob *o);
double perim (fob *o);
DEC_CLASS(rect);
DEC_CLASS(circle);
// Implementer interface, inner.h
DEC_GENERIC(double, area, void* o);
DEC_GENERIC(double, perim, void* o);
// Implementation
#include <stdio.h>
#include "include/fu.h"

DEF_GENERIC_AUTO (double, area, void *);
DEF_GENERIC_AUTO (double, perim, void *);

DEF_STRUCT_START(rect)
    double w, h;
DEF_STRUCT_END(rect)
DEF_STRUCT_START(circle)
    double r;
DEF_STRUCT_END(circle)

DEF_METHOD (rect_c, area, double, rect_area, struct rect* r) {
    return r->w * r->h;
}
DEF_METHOD (rect_c, perim, double, rect_perim, struct rect* r) {
    return r->w * 2 + 2 * r->h;
}

DEF_METHOD (circle_c, area, double, circle_area, struct circle* c) {
    return c->r * c->r * 3.14159;
}
DEF_METHOD (circle_c, perim, double, circle_perim, struct circle* c) {
    return 3.14159 * c->r * 2;
}

void measure (void* g) {
    printf ("a %lf\n", area (g));
    printf ("p %lf\n", perim (g));
}

int main (int argc, char* argv[]) {
    struct rect* r = fu_new (rect_c);
    r->w = 3;
    r->h = 4;
    struct circle* c = fu_new (circle_c);
    c->r = 5;
    measure (r);
    measure (c);
    return 0;
}

trws commented 8 years ago

Oh the fourth option was to use a code-generation approach for the trampolines other than the macro system. ZMQ recommends GSL for this, and it works, but I thought that would generate even less appeal than the others seemed likely to somehow... Sorry if I'm annoying about this stuff by the way, trying to wrap my brain around making some of the dynamic stuff, like graph creation and manipulation for the specification stuff, is twisting my brain into all kinds of nasty shapes trying to figure out how to do it in C without a type-system convention.

trws commented 8 years ago

Alternative that uses an almost go-like interface building interface. This actually requires "registration" of an interface in each implementer, but could be tweaked relatively easily to do something similar to go's runtime duck typing, it would just require an array of names or selectors and offsets into the structure of function pointers representing the interface and a cast function rework. Perhaps this would be more appropriate?

struct geom {
    void* type;
    double (*area) (void* o);
    double (*perim) (void* o);
};

typedef struct {
    struct geom* vtable;
    void* data;
} Geom;

struct rect {
    double w, h;
};

static double rect_area (struct rect* r)
{
    return r->w * r->h;
}

static double rect_perim (struct rect* r)
{
    return r->w * 2 + 2 * r->h;
}

struct interface_pair rect_type[] = {
    {"__size", (void*)sizeof (struct rect)},
    {"__name", "rect"},
    {"Geom", (void*) ((struct geom[]){NULL, rect_area, rect_perim})},
    {NULL, NULL}};

struct circle {
    double r;
};

static double circle_area (struct circle* c)
{
    return c->r * c->r * 3.14159;
}

static double circle_perim (struct circle* c)
{
    return 3.14159 * c->r * 2;
}
struct interface_pair circle_type[] = {
    {"__size", (void*)sizeof (struct circle)},
    {"__name", "circle"},
    {"Geom", (void*) ((struct geom[]){NULL, circle_area, circle_perim})},
    {NULL, NULL}};

void measure (Geom g)
{
    printf ("a %lf\n", g.vtable->area (g.data));
    printf ("p %lf\n", g.vtable->perim (g.data));
}

int main (int argc, char* argv[])
{
    struct rect* r = new (rect_type);
    r->w = 3;
    r->h = 4;
    struct circle c = {.r = 5}; // Stack structs are fine, nicety of interfaces like this, just no constructor call
    measure (CAST (Geom, r, rect_type));
    measure (CAST (Geom, &c, circle_type));
    return 0;
}

Or with macro sugar: (oddly fond of this one, it's surprisingly clean, and still allows cross-module passing of a type with more implementations than the initial target cares about, also relatively short and completely portable)

INTERFACE (Geom) {
    void* type;
    double (*area) (void* o);
    double (*perim) (void* o);
};

struct rect {
    double w, h;
};

static double rect_area (struct rect* r) {
    return r->w * r->h;
}

static double rect_perim (struct rect* r) {
    return r->w * 2 + 2 * r->h;
}

DEFTYPE (Rect, struct rect, IMPLEMENT (Geom, .area = rect_area, .perim = rect_perim));

struct circle {
    double r;
};

static double circle_area (struct circle* c) {
    return c->r * c->r * 3.14159;
}

static double circle_perim (struct circle* c) {
    return 3.14159 * c->r * 2;
}

DEFTYPE (Circle, struct circle, IMPLEMENT (Geom, .area = circle_area, .perim = circle_perim));

void measure (Geom g) {
    printf ("a %lf\n", g.vtable->area (g.data));
    printf ("p %lf\n", g.vtable->perim (g.data));
}

int main (int argc, char* argv[]) {
    struct rect* r = new (Rect);
    r->w = 3;
    r->h = 4;
    struct circle* c = new (Circle);
    c->r = 5;
    measure (CAST (Geom, r, Rect));
    measure (CAST (Geom, c, Circle));
    return 0;
}

garlick commented 7 years ago

Struggling with whether there's still some discussion that needs to happen here or not. The thread fizzled almost a year ago, but related issues resurface from time to time.

I have a problem with C that makes heavy use of macros to try to be something not-C. because that requires contributors to adjust their mental model to something unexpected for a C project before they can really do anything.

I don't know - maybe this effort is best redirected towards developing a good binding for the core in a language that supports the desired abstractions, in combination with better multi-lingual module support (e.g. #930)?

garlick commented 7 years ago

One pragmatic driver for this (as noted in #812 for messages) is a common method to "annotate" objects. We have separately implemented somewhat divergent "aux" accesors for flux_t, flux_rpc_t, and flux_reactor_t. Adding yet another one for messages to satisfy #812 could be done, or maybe we could introduce a common annotation interface that takes a void * object argument and accesses a common struct at the beginning of opaque types following this pattern?

One could then trivially expand the common methods available on these objects without changing their other interfaces.

trws commented 7 years ago

How about I put in a basic API for embedding the information in an object's struct, managing memory for them, and invoking methods on them? I have quite a number of different object systems that I've written over the years just for fun, so if we pick what features we do and don't want it would be pretty quick to throw something together that we could expand on later.

The basic questions as far as I'm concerned are:

Do we want all objects to automatically get reference counting? This is pretty trivial, but very frustrating to add later if it's not thought through up-front. (object size overhead actually isn't any higher until the number of references goes over all reasonable limits, so that's not a worry, think one pointer's worth total per instance for all of this)
Do we want methods to be accessible by name, or selector or whatever, in addition to by casting the object to a type whose vtable has the method. As with reference counting, this is pretty easy, but affects the API in such a way that it has to be there up front.
Do we want a way to declare abstract interfaces independent of types? If we have the previous bullet, this isn't too bad, and means we're really looking more at interface-oriented (think go) rather than object-oriented APIs.

Either way, it would be some variant on:

A base object instance struct, should be the first member of every object
A base class/vtable struct, should be the first member of every vtable, will itself contain an instance struct probably
IF we want name-accessible methods, and/or interfaces with late binding, the class will also have a bindings-like name->function pointer mapping to register functions by name to allow dynamic lookups

garlick commented 7 years ago

We have refcounting in all of the objects I mentioned above, so I guess I'd say yes to generic refcounting. Probably no to the other two in order to keep it simple as possible. If we do this, it can't look too weird to a C programmer approaching the project cold IMHO.

trws commented 7 years ago

Ok, so, first prototype is up. There are no helper macros, except offsetof but I plead that it's in the standard library, in here. The example I've been using throughout this thread ends up looking like the below.

It's a simple model based on (usually) dynamic metaclasses to describe functionality while the structure of the class and each instance are just plain old C structs in the user's control. The end result is that it's relatively consistent, depending on how you use it you can get compile-time checking of function signatures, it can do runtime method lookup (this turned out to be necessary for correct superclass method invocation...), and provides runtime type-checking, magic value checking, reference counting and some extra tracking options that aren't all quite bolted on yet.

Before going any farther with this, I'd like some feedback on whether it's a good direction, or if it needs some tweaking somehow to hit the notes we need. The implementation of the runtime being invoked by the little example is here, along with two associated header files.

#include "fop.h"

#include <assert.h>
#include <stddef.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct geom {
    fop_object_t _;
};

struct geomClass {
    fop_class_t _;
    double (*area) (struct geom *);
    double (*perim) (struct geom *);
};

struct rect {
    struct geom _;
    double w, h;
};
struct circle {
    struct geom _;
    double r;
};

// classes
fop_class_t *geom_class_c ()
{
    static fop_class_t *cls = NULL;
    if (!cls) {
        cls = fop_new (fop_class_c (), "geom_class_c", fop_class_c (),
                       sizeof (struct geomClass));
    }
    return cls;
}
double geom_area (void *self);
double geom_perim (void *self);
fop_class_t *geom_c ()
{
    static fop_class_t *cls = NULL;
    if (!cls) {
        cls = fop_new (geom_class_c (), "geom_c", fop_object_c (),
                       sizeof (struct geom));
        fop_register_method (cls, (fop_vv_f)geom_area, "geom_area",
                             offsetof (struct geomClass, area), NULL);
        fop_register_method (cls, (fop_vv_f)geom_perim, "geom_perim",
                             offsetof (struct geomClass, perim), NULL);
    }
    return cls;
}

// selectors
double geom_area (void *o)
{
    fop_object_t *self = fop_cast (geom_c (), o);
    const struct geomClass *c = (void *)fop_get_class (self);
    assert (c->area);
    return c->area (o);
}
double geom_perim (void *o)
{
    fop_object_t *self = fop_cast (geom_c (), o);
    const struct geomClass *c = (void *)fop_get_class (self);
    assert (c->perim);
    return c->perim (o);
}

fop_class_t *rect_c ();
void *rect_init (void *_self, va_list *app)
{
    (void)app;
    struct rect *self = fop_cast (rect_c (), _self);
    fprintf (stderr, "INITIALIZING RECT!\n");
    return self;
}
double rect_area (struct rect *r)
{
    return r->w * r->h;
}

double rect_perim (struct rect *r)
{
    return r->w * 2 + 2 * r->h;
}

double circle_area (struct circle *c)
{
    return c->r * c->r * 3.14159;
}

double circle_perim (struct circle *c)
{
    return 3.14159 * c->r * 2;
}
fop_class_t *rect_c ()
{
    static struct geomClass *cls = NULL;
    if (!cls) {
        cls = fop_new (geom_class_c (), "rect_c", geom_c (),
                       sizeof (struct rect));
        fop_implement_method (&cls->_, "initialize", (fop_vv_f)rect_init);
        fop_implement_method_by_sel (&cls->_, (fop_vv_f)geom_area, (fop_vv_f)rect_area);
        fop_implement_method_by_sel (&cls->_, (fop_vv_f)geom_perim, (fop_vv_f)rect_perim);
    }
    return &cls->_;
}
fop_class_t *circle_c ()
{
    static struct geomClass *cls = NULL;
    if (!cls) {
        cls = fop_new (geom_class_c (), "circle_c", geom_c (),
                       sizeof (struct circle));
        cls->perim = circle_perim;
        cls->area = circle_area;
    }
    return &cls->_;
}

void measure (void *g)
{
    printf ("a %lf\n", geom_area (g));
    printf ("p %lf\n", geom_perim (g));
}

int main (int argc, char *argv[])
{
    fprintf (stderr, "line 1\n");
    struct rect *r = fop_new (rect_c ());
    r->w = 3;
    r->h = 4;
    struct circle *c = fop_new (circle_c ());
    c->r = 5;
    measure (r);
    measure (c);
    fop_describe (r, stderr);
    fop_describe (c, stderr);
    fop_release (r);
    fop_release (c);
    return 0;
}

garlick commented 7 years ago

it can do runtime method lookup (this turned out to be necessary for correct superclass method invocation...)

Can you explain that one? The method lookup does add complexity and I guess a few instructions in the critical path.

trws commented 7 years ago

It's there, but it's only used for something where you want, or need, completely dynamic dispatch or to dynamically check for the existence of a method at runtime. In the normal case it works just like our objects, or C++ vtables for that matter, normally do and directly invokes a function pointer at a known offset in a table. As an example of what normally happens, the selector (or user-visible interface function) geom_area in my last comment could be inlined-out as something like this:

// What one would write
double geom_area (void *o)
{
    fop_object_t *self = fop_cast (geom_c (), o);
    if (!self) return NaN;
    const struct geomClass *c = (void *)fop_get_class (self);
    assert (c->area);
    return c->area (o);
}
// What happens
double geom_area (void *o)
{
    // geom_c()
    fop_class_t * geom_class = (Geom ? (init_geom_c(), Geom) : Geom);
    // fop_cast
    //{fop_cast_object
    fop_object_t *self = o ? (((fop_object_t*)o)->magic == magic ? o : NULL ) : NULL;
    //}fop_cast_object
    if (geom_class != Object) {
        fop_class_t *cur = self->fclass;
    while ( cur != geom_class ) {
        if (cur == Object) { // Object is the top, if we find it, it's
                 // not of this type or any subclass
        self = NULL;
        break;
        }
        cur = cur->super;
    }
    }
    if (!self) return NaN;
    // (void *)fop_get_class (self);
    const struct geomClass *c = (self->fclass);
    assert (c->area);
    return c->area (o);
}

Clearly that's a non-zero amount of code, but the actual execution path for a valid object is a check for null, comparison with the magic prefix, and pointer-value comparison for the class pointer in the instance with the requested class to ensure it's a valid subclass. The actual method invocation is a regular function pointer invocation.

The only time you'd pay the dynamic lookup cost for a method is if it's explicitly requested by writing a function taking an unknown abstract interface, think go's interface{}, or otherwise attempting to invoke a method you're not sure will actually exist. That then becomes a call to bsearch on a sorted array of structs in the type to determine if the method is available. If that ends up being something we want to do regularly, it would be pretty easy to do what Go does and make it cast-time, rather than invocation time, and pre-load most of that cost anyway.

Does that make sense?

[edit:] just occurred to me, it does add a bit of cost to initializing the class, but registering methods for dynamic lookup is optional, and even when it's done the cost is only paid once for the life of a given process, so I wouldn't count that against the critical path.

garlick commented 7 years ago

So just to recap a couple things @trws and I discussed face to face

he's going to refactor the code a bit so that the dynamic method lookup is clearly demarcated (possibly in separate file) since that stuff got me into the weeds this morning. I think the object implementation has to be easily grokked by C programmers, so even if the dynamic lookup is purely optional, IMHO it also needs to not be an impediment to understanding.
he clarified that we should be able to rework say flux_msg_t or flux_watcher_t internally as objects without changing exposed interfaces. void * object types are not required for example, and there can be a type-specific create function
we wouldn't expose the fop_ calls themselves in the public API, so for example an external project would not be able to inherit from a type published in the API. It might be possible to do that but since the internal size of a class is required in order to inherit from it, some work would need to be done. As is, these objects are as "ABI neutral" as our existing ones are.
he walked me through object creation and it would appear that what goes on is equivalent to what we do in places where we've attempted to generalize objects ad-hoc, such as flux_watcher_t. In fact, since there is one malloc for generic part and implementation in the proposed scheme, we could drop a malloc in the creation path if we converted flux_watcher_t. This is a concern for "high frequency" objects like flux_msg_t.

I plan to give this another look after @trws makes his refactoring pass, and possibly try converting a random type like flux_msg_t as a learning experience.

trws commented 7 years ago

Threw together a quick example of the dynamic lookup possibilities in the same base, it's clearly commented as such. The main use-case for these would be decoupling interface implementation from inheritence, the example creates a "jsonable" interface, then implements it in a class without any reference to any structural information in the code implementing the receiver of the interface.

I bring this up because it partly addresses item number 3, in that published interfaces, composed of just a struct, selectors and casting function, could be implemented by external components without any access to internal information.

grondo commented 7 years ago

we wouldn't expose the fop_ calls themselves in the public API, so for example an external project would not be able to inherit from a type published in the API. It might be possible to do that but since the internal size of a class is required in order to inherit from it, some work would need to be done. As is, these objects are as "ABI neutral" as our existing ones are.

I apologize for commenting before I've even looked at the examples, but I just wanted to mention that it would perhaps be a boon if the fop_ API was packaged and exported in the public libflux API. For example, if extension plugins are modeled as subclass of a basic "plugin" base class, plugins could not seemingly be built outside of flux-core without access to fop.h, unless I've misunderstood the comment above and suggestion posted in #930.

garlick commented 7 years ago

if extension plugins are modeled as subclass of a basic "plugin" base class, plugins could not seemingly be built outside of flux-core without access to fop.h

Great point!

garlick commented 2 years ago

Closing old issue