MatthieuDartiailh / pyclibrary

C parser and ctypes automation for python
http://pyclibrary.readthedocs.org
MIT License
65 stars 29 forks source link

Function type representations are inconsistent #8

Closed mrh1997 closed 9 years ago

mrh1997 commented 9 years ago

When defining a function in C, the corresponding Type object in python has a very poor structure:

int * func();

will become:

Type(Type('int', '*'), ())      # or (('int', '*'), ()) in <= v0.1.0

Although intuitive at a first glance, this nesting of types is logicially incorrect, since type objects are derived by adding a ' * ' a [x] or a tuple at the right end side of the Type (tuple). In this case we derive the 'int' by adding a ' * ' and get a pointer to integer. Then we derive the pointer to integer to get a function that returns a pointer to integer by adding a tuple. Thus it should be

Type('int', '*', ())      # or ('int', '*', ()) in <= v0.1.0

In fact the CParser internally scans the logical correct tuple and scrambles it if it is a function (see https://github.com/MatthieuDartiailh/pyclibrary/blob/master/pyclibrary/c_parser.py#L1083)

Interestingly function pointers are handled correctly (int (*f)(); is currently represented as Type('int', (), '*')). But for some reason I cannot understand (please let me know if you know some) the original author decided to handle functions differently...

It will become hard to fix this without compatibility issues. I think the easiest way would be to add an additional defs dictionary named 'funcs' which returns the "logical correct function definitions", while the existing 'functions' dictionary returns a mapper-dictionary, that converts the "logical correct function definitions" when accessing it.

I.e.:

assert parser.defs['funcs']['func'] == Type('int', '*', ())
assert parser.defs['functions']['func'] == Type(Type('int', '*'), ())    # this Type is generated "on-the-fly"

how do you think about it?

MatthieuDartiailh commented 9 years ago

Will think about this.

MatthieuDartiailh commented 9 years ago

I do agree that we should fix the inconsistency, but I wonder if we could not introduce a Function class representing a function that way we can isolate the type of the return value and in a tuple the arguments. This class could be a subclass of Type. Furthermore it would avoid confusion resulting between a function returning a pointer and a pointer to a function as differentiating them simply by the position of the '*' in the tuple seems a bit light to me. What do you think ?

mrh1997 commented 9 years ago

I agree that a separate Function class would be ideal and that the position of the '*' is a little bit light. But I consider the API for types as a fail anyway, since the original API lacks some important charactieristics:

My personal goal is to address the first issue so that it is possible to get all information from the parser (at least the information needed for my application; a RPC caller to embedded systems). So my idea was to provide extendability (by switching to classes). But without breaking compatibility it will always be a hack. We can reason about better interfaces, but they will never overcome the shortcomings of the API design for types.

In my optionion a ideal solution would look like:

# int const * (*a[3])(char x, struct b * y);
typ = Var('a', 
   Array(3, 
      Ptr(
          Func([
              Param('x', Char()), 
              Param('y', Struct('b'))], 
              ret_val=Ptr(
                   Int(type_quals=['const'])))

Here the type structure is represented exactly like needed by applications and can be easily extended by extending the classes Ptr, Func, Param, Var, ...

But since this solution would break compatibility totally and mean a lot of work we should stick to the current solution. In my understanding the current type definition should behave in a predictable way. That means, it provides exactly one additional entry for every additional nesting (ptr/array/func).

Switching to a function object would be more intuitive for newcomers and allow easier access of API (i.e. retrieval of return type), but still not overcome the shortcomings of design (pointer to functions would be still standard Type objects. I could retrieve the return value of a function then, but not of a pointer to function).

MatthieuDartiailh commented 9 years ago

Actually I think that if you can come up with a neat and clean design it would make sense to break the parser API. I mean this is 0.1 so better to do it now rather than later. The only user I am aware of simply relies on the CLibrary so as long as we don't break things are fine. If we go for such a solution it would make sense to make things looks like a kind of ast. I understand this would be a major undertaking but if we agree about a solution I am more than willing to do my part.

mrh1997 commented 9 years ago

Actually I will implement such an interface internally as a wrapper on top of pyclibrary anyway. My application will be build then against this interface instead of pyclibrary's Type() object. If you are interested, I will share my design with you before implementing it, so that you can add your ideas to it.

Then I could integrate it into pyclibrary at a later point of time. Unfortunately I currently have no time to do this job (although I am keen on doing it). I even cannot promise to do the job at all. But at least it would be possible without changing the API for integrating your ideas and breaking my application then...

MatthieuDartiailh commented 9 years ago

If we come up with a clean design I will do my best to implement the changes myself as quickly as possible.

mrh1997 commented 9 years ago

Sounds great.

I did a initial draft of a possible relacement for the current 'Type' class and stored it at https://gist.github.com/mrh1997/876070be3763733f849e. Actually it is only a bunch of class definitions without implementation and without docs. But there are some sample usecases at the end to show the idea.

Please have a look at it and let me know, if you could image to switch to something similar to that.

If you are interested in it, I propose to start an issue, where we discuss the details. After agreeing in the details I would incoporate the new design into c_parser.py if you could do the integration job for the backend.

MatthieuDartiailh commented 9 years ago

I think the type system looks nice (but I don't understand the goal of BaseType as it simply add a name and is not passed one in your example). I think it also makes sense to remove the global initialisation of the parser (I will perhaps do the same for the library too), but we must preserve the ability to parse multiple files in one pass, which means that the CLibInterface (which could be CHeader) must be stored on the parser, must store everything (including macros). We could of course add a way to reset the parser. Let's open a new issue to go more into the details, if you do the parser I will be happy to deal with the bindings. As this will imply a lot of work, possibly interleaved between us I propose that we work on a separate branch (parser-rework) so that you can do several PRs (one for the type system, one for the parser parametrisation and stuff) and I can work on the bindings after the first one is in.

mrh1997 commented 9 years ago

superseeded by #10