Open mrh1997 opened 9 years ago
Those ideas are very interesting I think. And I would be very happy to see your work become part of pyclibrary.
Currently a backend is meant to back a CLibrary by exposing the specificity of the chosen ffi. It is specific to C but not to ctypes. It makes user life easier by doing several things on top of the ffi :
I think it would make sense to move to a kind of binding generations like the one you suggest. We could actually propose :
Creating an in memory wrapper (or calling a cached one) should remain easy (which would be equivalent to what we are doing now). In some cases you cannot version control the wrapper because it would expose proprietary headers .... (I have cases like that)
For the roadmap, I would say we first focus on the parser and the preprocessor (you mentioned somewhere extracting it from the parser which would make sense). Once this is done we can revisit the generation of the bindings.
I am happy that you like the concept. Of course I will finish all jobs discussed until now before starting the generalized backend mechanism. I create this issue that early, since we might need some time to discuss everything. When I am finished with my current work, I will know already how to continue then...
regarding your proposals of the new binding generator:
regarding the in-memory wrapper/caching:
In fact I see no use case for it apart from enforcing compatibility to 0.1.1. The usual use case is:
import
statement.Did I miss possible other use cases? Tell me about the use case you mentioned (by the way, what do you mean with "proprietary headers"?).
regarding concept of implementation:
Without having digged into the details I think a "backend" should be as simple as writing templates in a template engine syntax. Do you have preferences regarding the template engine? I think it should fulfill the following requirements:
Currently I favour Jinja2 due to the fact, that its syntax is also used for Liquid, which is used by GitHub Pages => better chances that a lot of people are familiar with the syntax. Furthermore my impression is, that it is very widespread?!? To get rid of the bloated jinja2 library, we could maybe write our own "micro-jinja2-implementation" later. This module then implements only features needed by our backends and would be very minimalistic.
Any better idea?
regarding distribution of workload:
As I mentioned already I would do the complete job if you are not interested in investing time. But of course I would be glad if we could share the work a little bit.
regarding the new binding generator :
regarding the in-memory wrapper/caching:
I am actually using pyclibrary to wrap a dll used to communicate with a digitalizer board and as I am not supposed to disclose the header files, I doubt I can disclose the wrapper.
For all other use cases you are right. Anyway when using it it should quite easy to check whether or not the wrapper has already been generated and if not generate it locally..
regarding concept of implementation:
For a not smart backend we can go simply with a template engine I guess. I am not familiar with any but I do know about Jinja2 which I think indeed quite popular. However for a smart one I think it nice to provide abstractions such as pointer creation, array creation, ... Those are basically the not implemented private method of clibrary object. For each kind of ffi some standard function could exist, that makes easier to switch from one ffi to another. (I am not yet sure of what will happen of ctypes vs cffi for python and I think that having a high-level abstraction not caring about which one you use is nice).
regarding distribution of workload:
As we go on I will try to give you a hand. I cannot make promises as I am trying to develop/maintain/contribute to far too many libraries in parallel of my work, but if we can draw a clean separation between two tasks I will be happy to work on one. One possibility would be for me to work on the preprocessor while you work on the parser perhaps. Could you open an issue with your ideas about that separation ?
I think we agree already about the rough concept ;-)
To summarize it:
Regarding our current plan, that you integrate the current c_model.py into the backend, while I integrate it into the parser: I think, you can cancel this job, as the backend will be completely replaced => your work would be for no purpose.
Regarding your special case for caching: it will be easily be possible to 'emulate' it simply by the following code in your application:
try:
import binding_for_xyz
except ImportError: # bindings were not generated yet
import binding_generator
binding_generator.translate(input='xyz.h', output='binding_for_xyz.py)
import binding_for_xyz
else:
import md5
if binding_for.xyz.version != required_version or md5.new(open(
xyz.h').read()) != binding_for_xyz.header_file_chksum:
binding_generator.translate(input='xyz.h', output='binding_for_xyz.py)
reload(binding_for_xyz)
Unfortunately I found no Jinja2 compatible mini templating engine.
Titen (https://code.google.com/p/titen/) looks fairly promising, as it has only ~200 lines of code.
But I prefer to use Tempita (http://pythonpaste.org/tempita/) as it is still not too big (~1200 LOCs) and provides much more power.
Our project priorities moved somewhat. This means I will continue with pyclibrary in two or three month. Hope this is acceptable for you...
I am fine with that. I am myself quite busy for now. By the way, are you set on tempita as a template engine ? If it is so, I can try to push to get the python 2/3 compatibility in and perhaps start some experimentations if I find some time. If for any reasons you cannot come back to this work please let me know.
I will come back to this work. But I will keep you informed if there would be further delays.
regarding tempita: I am very unhappy with all template engines as I detected they all have a big drawback: they do not allow codestyling as they do not know the destination languages AST. They are either doing simple text processing (without any code formatting) or are specialized to XML/HTML AST. This means if I want to output Python or C instead of HTML, the result is very ugly or even invalid (i.e. correct indentation in python is not a question of elegance but it is a must).
This is why I am playing around with my own template engine that works the following way:
As I do not know if it really will work out as I image, I am currently creating a prototype. As soon as I have a prototype that I am happy with, I will come back to you to discuss the idea.
Seems interesting. Keep me posted.
Any progress on this ?
No. I am really sorry for this.
Actually our company got some very important projects, which will move this issues away for at least a couple of month. This means I had to it in my spare time or wait until next year.
I definitely will continue the job but probably at end of this year ;-( Is this a problem for you?
For the time being it is not a problem but I may need to convince some people in the not so far future that using pyclibrary is a good idea and it is not easy when you expect to change the API on a short time scale. Actually I am also short on time, so I can easily understand you. Would you consider posting your experimental templating system on github so that I can give a look and perhaps try move a bit forward ?
What is "the not so far future", when you have to convince some people of pyclibrary? Can you give me at least two or three weeks to build the templating system far enough to allow s.b. else (i.e. you) to understand the idea?
I can, I think, easily give you a month but I don't want to rush you or steal the templating project from you. My point is rather that if you make it available on github and provide me with some guidelines I may work on it and make pull request while you are busy. You will then be able to review my work which is less work than writing the code yourself. I will need to use pyclibrary for wrapping new dlls in my lab in something like a month and around the same time I will try to integrate it in another library I collaborate on. I don't need the final version then (as I do not think we can manage it) but I would like to say that efforts are being made and have myself a good idea of the future to try to make my code as easy to update as possible. Does that make sense to you ?
Sorry for leaving you so long without any response. But I didn't want to come back to you without showing you something. Unfortunately I needed much more time than expected to come to this point:
After playing around a lot with a lots of different ideas I think I have a very clean and lightweight approach. It is far from being complete (in fact I implemented only some elements for demonstration purpose). If you check out my "new-backend-api" branch you can have a look into it and give me your comments.
Here some important infos:
pyclibrary/asts/astcore.py
).
If you are still interested in sharing the work: I think a good possibility to split the work would be the two Transformers: the rules for the C-AST->c-types-bindings transformer (pyclibrary/backends/ctypes_bindings.py) could be done by you, while I concentrate on reworking the Python-AST->layouted text transformer (pyclibrarey/backends/pep8_formatter.py). The second one has to be rewritten completely, as it uses the CodeLayouter(), which does not fit into the concept of ASTs and Transformers yet.
I am very curious for your thinking about the concept...
I like the idea, really. But I have some questions :
I am definitively interested in sharing the work, and I am fine with the separation you are proposing. Once we have discussed in a bit more details the points related to my questions I will be happy to start working on the ctypes transformer.
Honestly said, I played already around with the idea of using the integrated python AST module. But I didn't know "astor" so I decided against using the python AST. Now, after you told me of astor I started rethinking this idea.
But I still come to the conclusion that using our own AST is a better idea for two reasons:
Regarding splitting the generated code I am not sure if I understood your intention. Did you want pyclibrary to output multiple .py files, each containing different parts of the header file (constants defs / func defs / structure defs)?
This case is doable fairly simple. We simply build not a sinlge Transformer but multiple transformers (one for constants, one for funcs, ...) and store the output of each of them in a separate .py file.
Ideally all transformers are the same class but parametrized differently (i.e. by a parameter called "output_only_constants", "output_only_funcs", ...). Alternatively we can build Filter Transformers that get a C AST and outputs a C AST with only constants/funcs/... These filters are then chained before python-AST-converter-Transformers
Great if we can split the work ;-) I would be very happy if we could do it the way as proposed...
Your arguments for the AST makes sense (at least to me) and as I am not an expert on the matter (and as your implementation seems easy to maintain) I won't fight on it.
You got my idea about the separation. It is just that some project wrapping external dll do that kind of things to avoid having a single huge file cluttered with all the constants (defined as macros), and I thought people might appreciate the possibility to do the same using pyclibrary.
I will try to tackle the ctypes translation. I won't be able to work full time on it but I will try to keep progress steady if a bit slow. I will work on my repo under the new-backend-api branch (based on your current branch). Whenever you make significant progress you can open a PR against it. Being able to generate real code my also help me.
I am starting to play with the transform idea, and I have a few questions :
Of course you can modify the AST (I coded it in 20 minutes without too much thinking about it). Actually there are more statements missing (i.e. "with"). My intention was to add them when required...
But please keep in mind that the AST should be abstract enough to hide python 2/3 specific parts. I.e. in python 3 you have to do a relative import like "from . import module_name", while in python 2.6 you have to write "import module_name". Thus I would introduce a single Import node. Of course this single Import node should be powerful enough to encode the semantics of both import variants. If we follow these rule, your C->Python Transformer has not to worry about:
Regarding your idea of having a module node: I think this will reduce the modularity of the transformer concept. My idea (which I don't know yet if it works out) is to work with very atomic transformers. Depending on your use case you simply have to change the pipelines you are building from these transformers then.
I.e. we could have a transformer for constants and one for struct/union defs. Then you simply pipe each of them with a separate code emitter transformer to get separate modules. For those who want them to be outputted into a single file, we could provide a (generic) merge-transformer. This merge transformer can be used to merge both transformers into a single AST...
How do you think about this?
Regarding MacroVal: you are right, I didn't consider that.
The Problem is, that the macro def is a string of C code (like "A+B"), which cannot be converted to a python AST (we would require a full C parser, which also supports AST nodes for expressions like '+').
My proposal would be to replace the macro references used within a macro definition and THEN evaluate the result.
To make the result more readable we could add the original string as comment to the AST. Then #define C A+B
will be transformed to C=4 # A+B
(given A is 1 and B is 3). To get this we would require a 'comment' field for the generic AstNode.
What are your ideas?
Regarding imports I think it would be better to make the ast closer to Python3 and let the translator handle the change (note that I do not support python 2.6 and that from . import foo is correct in Python 2.7). Creating an ast that differ too much from the real python one will make our life harder.
The thing that worry me is that if we do not have a Module node we won't know where to put the imports. From your description I guess that we would actually have some sort of super structure taking care of chaining the transform and such, and that such a construct could inject the generated code into some kind of templates. This may indeed make sense as we will have to package some utility functions and base classes that are more easily written as text that using our own ast. We can keep that for later.
My point for macro val was actually more straightforward. All values are evaluated during the pre-processing (which may be debatable), but is easy to do because we have the full context needed to do the evaluation. And we actually store them. We could include inside the ast node the value under a 'value' member. But actually coding the types transformation I realized the transformer need to store a ref to the clib because it contains informations without which we cannot resolve the custom types. So I went with storing the clib into a '_clib' attribute (in transform_clib) that can then be accessed by all transform that need to dive into it for some infos.
I will try to go on with the basic transformations and see what comes out.
Probably you are right. Lets stick to the python 3 AST. If we are lucky it is abstract enough to transform it to python 2 source code. If not it should not be a big deal to adapt it slightly...
After sleeping a night this came to my mind, too. So lets start with your module node proposal as it is less experimental and will be implementable without unexpected problems. After we have a running system we can think about refactoring it (if we see it could make sense).
I do not like cyclic structures as they can cause trouble in python (i.e. will not be garbage collected as soon as one of member objects has a del method). But if I understood you correctly you added this c_lib reference to ValMacro which results in a cyclic reference...
To address this kind of problem I added the possiblity to provide contexts to transformers (see 'ctx' in https://github.com/mrh1997/pyclibrary/blob/new-backend-api/pyclibrary/asts/astcore.py#L145). This way the CLibInterface tranformer function could provide the clib to the ValMacro transformer function.
Another problem came to my mind: When working with ValMacro we could do eval. But how about FnMacro? Here we cannot simply run a eval, as we do not have all parameters during compile time.
The only idea to solve this issue is a ugly workaround: We introduce a special Python AST Node "UnstructuredExpression" which simply contains a string of python code. Your transformer will emit this node when coming to macros. My transformer will route it directly into the sourcecode without further processing.
Any different idea?
Actually I added the clib to the Transformer as otherwise we would need to always pass it as a context as we cannot know when we will need it. So there is no cyclic references.
ValMacro needs to be exposed on the python side as those are often used as arguments to the function. However FnMacro does not need to be imported as they are never needed by consumer code (correct me if this is not true).
As in C FnMacros are often used as replacement for functions (i.e. to ensure legacy code compatibility or for performance reasons) I think we should not ignore them.
But I agree that this is far less needed as the rest => lets ignore them in a first step...
Of course Macros can be used for that, but I don't think that the dll can export it.
Hi, I took some time to rewrite the python ast. I chose to follow closely the actual python 3 AST as the translator can always (nearly) find a way to rewrite it in a compatible fashion. Also I was not a big fan of your custom init added by the metaclass, so I moved to a kwargs only solution and made possible to easily specify default values for some slots. Feel free to comment.
Fine. Working with kwargs is much more elegant when using the ast.
after digging a little bit more into the code, I have some questions:
Assign([Name('var_name', Store())], [Int(1)])
'Store()' is not really needed, as 'Assign' implies already that 'var_name' has to be stored (not loaded).I will try to answer the best I can.
I am not claiming that my implementation is perfect (or better than yours). We can still change it (and change the inheritance relations). I just like the idea to stating close the real ast.
I agree that following the Python 3 AST is a nice idea. But I still think that we should concentrate on our requirements. And part of this requirements are a clear inheritance relation, as the transformers are utilizing the mro of each ast object to identify the correct transformation rule. If it is OK for you I will adapt inheritance slightly when implementing the backend and I see that it simplifies my job.
Feel free to experiment it is important we have a solid AST to build upon.
Bad News: your company decided to check if libclang could match our needs (instead of pyclibrary). If this works out, I will not be able to spend time on this project any more... Sorry for leaving the project in an incomplete state...
While reasoning about my RPC library's design which shall be based on the parser of pyclibrary I came to the conclusion that your (really nice) concept of pluggable backends would be a perfect match for my library.
But according to my (yet) limited knownledge of the backend it has a drawback: it i too specialized for the current purpose. In fact it does two things:
If these features would be modularized and the python ctype object generation would be generalized to a static interface file generator, the range of applications of the library would be drasticially extended:
My idea of the static interface generator is basicially a template engine which gets the AST from the CParser and outputs a interface definition for a specific language/ffi-library/parametrization. If one still needs dynamic interface object generation (i.e. if the according C header file is not static; a use case which should barely occur), one could still generated the interface definition module as in-memory string object run
exec()
on this string.To come to an example: The current backend would be replaced by a template, that converts the following header file:
to the follong python module:
(This example generated a python module without using CallResult. But of course it would also be possible to integrate CallResult into the generated module; would depend on the parametrization of the code generator)
@MatthieuDartiailh: What do you think about it? Regarding all the work to be done: as I need to do it anyway (for my RPC library, which has to compile a C-wrapper from C Headerf files) I would be glad to do it as part of pyclibary instead of my own project.