MatthieuDartiailh / pyclibrary

C parser and ctypes automation for python
http://pyclibrary.readthedocs.org
MIT License
65 stars 29 forks source link

Major Refactoring of API to a clean OOP interface #10

Open mrh1997 opened 9 years ago

mrh1997 commented 9 years ago

As noted in Issues #5, #6 and #8 the interface of version 0.2.0 has some shortcomings in the API which shall be cleaned up. PR #9 did already a careful rework that keeps compatibility. But as decided in #8 the interface shall be reworked completely.


This issue is a continuation of the discussion started in https://github.com/MatthieuDartiailh/pyclibrary/issues/8#issuecomment-109819028 and superseeds #8. Regarding the annotiation in this comment:

  1. BaseType is for simple, uncomposed types. I.e. int, unsigned char, float (What do you mean by "is not passed one in your example"? It is used in the example multiple times; i.e. https://gist.github.com/mrh1997/876070be3763733f849e#file-typesystem_proposal-py-L155)
  2. I also think that the parser should not be initialized globally. Will change this in my part of the project.
  3. Of course we need to be able to parse multiple headers (my fault). But I propose the parser not to store CLibInterface, but the parse method to accept multiple files (see modified GIST https://gist.github.com/mrh1997/876070be3763733f849e). This way we ensure the single resposibility principle:
    • the CLibInterface is the python object model of all public header files of the corresponding c library
    • the CParser object knows about the compiler details which may vary from compiler to compiler (see below).
  4. My intention for the naming of "CLibInterface" was to show, that this object corresponds to a c library and not to a c header (one CLibInterface per libary, even if the library has multiple headers). How about: "CLibHeaders"?
  5. Using the parser-rework branch is fine. How about releasing the current release as 0.2.0 (which is intentionally compatible to 0.1.0) and start a new, incompatible major version (0.3.0) when merging the parser-rework branch?

Regarding the CParser object, I have one more idea:

An application could support multiple compilers by simply instantiating multiple CParsers with slightly different different grammars. Currently the grammars may only vary in type_quals, but in future we could even implement more complex differences. I would propose to provide a submodule for every standard compiler (visual studio, gcc, ...) that only provide a single object: a prebuild CParser for the corresponding compiler.


Regarding cooperation:

I propose to change the unittests of the backend (test_ctypes.py): Currently the tests require CParser which is not a clean test design. A better approach would be to separate the header-file-model (see GIST) completely into a separate file and the backend test only rely on this file (not on cparser.py).

This way the test would be a "real" unit test, testing only a single unit (the backend; not parser+backend).

If you agree with me, I would start with a first PR, where I only add a independant file with the header-file-model. Then you and me could work independently.

MatthieuDartiailh commented 9 years ago
  1. Sorry I completely misread that sorry.
  2. Cool.
  3. The one case I would like to also see covered is the possibility to pass an existing CLibInterface object to a Parser (or several) which contains necessary definitions (such as windows headers or the like), those definitions could be stored in the newly produced CLibInterface (in an attribute named 'externals' for examples).
  4. If we bundle all definitions in a single object then I agree on CLibInterface
  5. I think I will rather go with a 0.11 version for this time and next one will be 0.2.

I like the idea of prebuilt parsers, let's do this.

I agree with you about making the backends tests independent of the parser. Don't bother about writing the CLibInterface for the header I use for the tests I will do it myself while working on the backends, and once I will be done I will also add a test checking that the parser does indeed produce the same result as the handwritten CLibInterface.

Lets rather start with the type system (in which you will have to include a Var and MacroVar object for completeness). As soon as this is in place I will start working on the backends.

mrh1997 commented 9 years ago
  1. Agree with you. For simplicity I Would propose, that every Parser contains a single CLibInterface, that acts as template and is copied in every call to parse()

Regarding MacroVar: I added a MacroDef() class and .macros attribute already (see gist). Did you mean this object or did you mean an additiinal class (meant for what?).

Regarding Var: My intention was to handle this via the .globals Attribute of the CLibInterface:

All global objs (functions and vars) are stored in the .globals Attribute. If a globale Entry points to a FunctionType, it is a Funktion, otherwise a variable.

MatthieuDartiailh commented 9 years ago
  1. Save if we end up with a compelling use case it should work.

Looking at the gist I was under the impression that this was reserved to macro function. Having two class one for macro functions and one for macro values might make things easier to understand.

My point is what do you do with a global entry referring to a lets say a structure, where do you store the members values ? The idea behind Var would be to store the type and alongside the value.

mrh1997 commented 9 years ago

What do you mean with compelling use case?

I can create two classes for macros if you prefer, but then I propose to derive one from the other, to avoid duplication oft common code.

Why do we need a value? Am I overlooking sth.? If I am correct the interface Definition only needs a Name and a type for vars (respektive respective key and value oPointe to a obale attr). Values Come into the game if you load a library. But the values of the library instances variables are currntly represented by ctype module objects.

If a variable points to a structure, the idea is, that it shall look like:

assert p.globals == { 'varname': TypeRef('struct structname') }
assert p.types == { 'struct structname': StructType(...) }
MatthieuDartiailh commented 9 years ago

Sorry about that I started my answer with something than re-wrote it ... My idea that there might be use cases where it is desirable to include definition from multiple other library (windows header + user common library header) when parsing some headers. Such a case would require to be able to update the CLibInterface object hold by the parser.

My idea for the macros is that in the case of a function we simply keep what is needed to perform the preprocessing steps, in the case of values the user can query the actual value (as a python object) but maybe this could be reserved to the backend. (I definitively agree that those should have a common base class).

I was considering the ugly case of global variables declared in a header file (which may happen in single header project). Something like :

int var = 1;

This is ugly and should be avoided but I am worried that this exists nonetheless.

mrh1997 commented 9 years ago
  1. I see your point. But I think the goal should be to work with "constant" CParser objects (otherwise we could not provide them as globals in the submodules for standard compilers). How about adding a .derive(std_clibintf) function to the parser class, that allows to create a (shallow) copy of the parser with different "standard include definitions"?
  2. I think it could be possible to merge both cases into a single class, without dirty "if is_macro_func" checks in the macro class:
    • macro class provides a parameterlist attribute which is a list of strings (it is empty if we have a macro value and if we have a macro function without params).
    • macro class provides a function .value([p1, p2, p3, ...]) which is returning a string, where the parameters p1, ..., pn are replaced.
    • To provide a possibility for the application code to check if this is a function or value an additional attribute .is_macro_func could be provided (But this differnentiation is not needed by the code of the macro object)
    • All further code needed for differentiation (like additional parenthesis on funcs) are parser specific and should not be in the responsibility of a macro object.
  3. In the case you outlined the value is not needed by the backend to provide its functionality (let me know if I am wrong). If you want the information for completeness, I suggest to introduce an additional attribute .initial_vals to CLibInterface
MatthieuDartiailh commented 9 years ago
  1. I am fine with the derive method.
  2. Why not... I would prefer macro function to have a format method rather than value. As macro definition don't have parameters it would be easy to just make macrofunc a subclass. Creating two classes is IMHO cleaner (and not much more work).
  3. I guess you are right this is just a stupid corner case ... as adding the initial_vals should be easy I propose to do it nonetheless.
mrh1997 commented 9 years ago

OK. I will implement the module as discussed above...

mrh1997 commented 9 years ago

I reasoned about the "stupid corner case" int x = 1;: I cannot imagine a real-world header file like this, since it would lead to the following problem:

Imagine we had a mod1.c which is used by mod2.c and mod3.c. mod1.c's header (which contains int x = 1) can only be included by eiher mod2.c or mod3.c otherwise the linker would complain, since mod2.c and mod3.c are defining 'x'.

If you agree with me, I propose to avoid complicating the interface of CLibInterface by an attribute that is never needed.

MatthieuDartiailh commented 9 years ago

I agree that this is very ugly coding, so ok lets keep it out for the time being. If we find somebody stupid enough to do such a thing we can always add it later.

mrh1997 commented 9 years ago

While implementing #15 I figured out a relatively clean interface of the parser. Please let me know your thoughts about the API.


Regarding reworking of preprocessor: My original intention was to move the macro evaluation to the CLibInterface but keeping the CParser parsing the macro definitions. This would have required a AST like structure for macro definitions. Then the backend could use the macro without need to have knowledge of the CParser (which is currently responsible for evaluation).

But then I detected, that a clean macro evalution required more work, as the current implementation is not really conform to the C preprocessor spec. The reason is that macros currently are always evaluated during definition, instead of invokation. This can cause trouble. I.e.:

#define Y X
#define X  int
Y var;    //<- var will be of type CustomType('Y') instead of BuildinType('int')

Then I realized that in 99% percent the current implementation should do its job. Thus I think I will skip this job and continue with the backend. If we come back later to the preprocessor issue, it should be doable without API change (the CParser would use the Preprocessor object internally then)

MatthieuDartiailh commented 9 years ago

This makes perfect sense to me. Have you found a templating engine suiting your needs ?