ginkgo-project / ginkgo

Numerical linear algebra software package
https://ginkgo-project.github.io/
BSD 3-Clause "New" or "Revised" License
401 stars 88 forks source link

Code generation macros impairing easy code navigation and analysis #52

Open tcojean opened 6 years ago

tcojean commented 6 years ago

This issue is the summary of a discussion which arose from MR #46, but is more general in Ginkgo. I will try to expose the current state of things and eventual problems. If you find any mistake here feel free to correct it or notify me.

Macros are very useful to generate code

Macros are very useful to generate code. This allows to completely remove boiler plate code from every place and therefore reduces the code base by quite an amount of lines of code. One such current usage is GKO_REGISTER_OPERATION(xxx, kernel) which does the following:

In this case, the macro usage is at the top of the kernel cpp file, therefore you can at least go there easily and inspect the code if needed in a relatively easy way. But that is not always the case, and it's particularly true with the factory macros of MR #46, and probably some macros we already use elsewhere in Ginkgo.

Problem: this impairs easy code navigation and analysis

There is a very old saying, as old as macros, that macros are evil incarnate and should be killed with fire. There are multiple classic reasons for that, such as the fact that macros don't have scope, don't ensure type safety, are very hard to debug: you need to run the preprocessor to know what code is actually generated by your macro in order to debug it, and much more.

I'm actually kind of fine with the previous points (not so much debugging), but for me one big problem is when macros add class members or class/method definitions, in particular when the internal name is not directly in the parameters, then you have in your code something such as:

// [...]
make_xxx_operation(...);
// [...]

But if you grep for this symbol, or use more advanced tools such as tags, then you will never find this particular symbol in your source code. The symbol appears from nowhere in the code, because a macro generates it. To find it, you need to go through the macro and pretty much navigate through the code by hand, which makes code navigation and understanding problematic.

My main question is the following: I perceive this to be a pretty serious problem personally, is this the case for you? That is maybe not the case for everyone, it really depends on habits and usage I guess.

Possible solutions

Supposing this is a problem, multiple solutions can be thought of. All of these solutions involve some sort of compromise between doing more boiler plate code and keeping the code generation capacity that macros provide.

Extreme "solution": no macros

This may be hard to implement in practice so that's not really a true solution. The theory is to never use it but in practice there is always some cases where you will need macros. Maybe a combination of templated classes and namespaces can completely remove the need for some macros though. This will still lead to some more boiler plate code, though.

Do not put member/function declarations in macros

This solution is to let the user declare his own members therefore they are properly defined and embedded in the code. The main downside is this clearly increases the amount of code that you have to write. In addition, you may have weird user/macro interactions such as:

Never generate symbol names in the macro

The main idea here is that if your macro declares a type foo_type, then as parameter to the macro you pass foo_type directly. That is:

We have a clear addition of boiler plate code (names), but at least we can grep the code for foo_type and find that very easily. Therefore there are no magically appearing names anywhere and most usual navigation tools should work properly. The downside I can see is that for some macros, this list may get fairly long.

Maybe: use some links embedded in documentation

I'm not sure we can do this properly and with which tool, but an idea would be to add anchors/bookmarks at places in the code, and when we use a symbol which comes from nowhere we give a direct link to the precise macro call which generates it.

Maybe: find some decent plugins for standard editors which call the preprocessor on the fly

We could just keep macros and have tools which pretty much always call the compiler preprocessor, so we can view every particular macro call if needed. A problem is the tools I have seen doing that are too simple and not very usable, though.

gflegar commented 6 years ago

Note to everyone commenting here

Before forming an opinion on this, please try writing some of the code in Ginkgo without using macros. Most of them are there precisely because it is not possible (or at least I don't know how) to provide nice and clean interfaces for certain constructs. This means that I tried doing all of these things without macros, and only when I failed have I written a macro for it.

My opinion

My main question is the following: I perceive this to be a pretty serious problem personally, is this the case for you? That is maybe not the case for everyone, it really depends on habits and usage I guess.

It is a problem for me, but I learned to live with it. In my experience, there are no adequate editors and tools that can properly figure out anything but simple C++ programs. This is not strictly related to macros, but more to templates, so even if we "ban" macros or do whatever with them, the tools still won't be able to help much. The problem is that C++ gives you a whole Turing complete language that you can use to generate code at compile time, so the only way the tool can figure out anything about a general C++ program is to actually compile it (and even then it cannot figure out anything about the parts that are not compiled - like templates).

No macros

As you might imagine, "no macros" is not an acceptable solution for me. Whatever some people may say, macros are an important part of C++. The main problem is that people sometimes overuse them for things where they are not required (especially in C), and that is why they are so hated.

Since I am the author of all of the macros in question, I can tell you that I tried to use them only when I couldn't find another alternative. I can identify 3 uses cases where nothing else seems to work: clumsy class specifications with strange inheritance and unclear friends; generating boilerplate definitions that depend on non-type names; having to remain in the same scope. Everyone is welcome to go over those implementations and see if some things can be simplified, and some macros removed. I am pretty sure something better can be done with GKO_REGISTER_OPERATION.

Note that the first and third use cases are why macros are needed in google test and google benchmarks, so both of these tools wouldn't be able to have nice interfaces without macros.

Do not put member/function declarations in macros

One question here is why would we allow type definitions, and not function and variable definitions? It seems awfully nonsymetric, and completely random. You wouldn't solve any of the problems we have, as the exact same problems exist for types (I guess, you were just lucky enough not to encounter them).

Assuming we add types to the forbidden content, we are again in a situation when we are not able to have use case 1, which makes tools like google test or google benchmark impossible.

Never generate symbol names in the macro

This is the only viable option I see here. I would slightly change the rule to make it usable though:

Never generate public symbol names in a macro.

By public I mean symbol names that are supposed to be used outside the macro. Sometimes it is necessary to generate a temporary local symbol in the macro.

Use some links embedded in documentation

Isn't that even more redundant than the previous one? We would have extra redundant code around each invocation of the macro, and around each use of the symbol generated by the macro.

Find some decent plugins for standard editors which call the preprocessor on the fly

If there are some. Anything based on clang should be able to process macros, but a bigger problem here are templates - even clang cannot figure them out (except for specific instantiations of the templates).

#

hartwiganzt commented 6 years ago

A central part is that the code has to be understandable for outsiders. This is the main selling point for other codes people in the Ginkgo project have written before. I like the idea

Never generate public symbol names in a macro.

But I also think that a comment would not hurt too much - maybe something very verbose like:

This macro was generated from XXXX in XX.cpp

gflegar commented 6 years ago

A central part is that the code has to be understandable for outsiders.

You do realize that this depends on what level of knowledge we assume the "outsider" has (as an extreme example an outsider could be my mother, who doesn't know a thing about programming)? If we aim for simple code, than it is impossible to build good enough abstractions, and the library is no longer extensible.

I wouldn't agree "understandable" code is a selling point of a library. If you look into branches of software development that have far better library ecosystem than HPC (e.g. web development), you'll realize that no one actually cares how the implementation of the library looks like, just what kind of interface it exposes and how easy it is to use. You can also see that in HPC - have you ever needed to know how OpenMP or MPI is implemented, or were you content just with the interface? "Understandable" code can only lower the bar for new contributors - and I think we are good enough w.r.p. to that. We already have 4 "contributors" (quoted since I'm including @tcojean here, who I wouldn't call just a contributor) with CS background, which all do good work, and I don't see any of them having serious problems understanding the code.

I'm not trying to say that it's o.k. to write convoluted and not understandable code (that's one reason why we have code reviews), but I think that the main goal should be to provide abstractions that allow people to easily use and extend the library, and they should not be concerned with how exactly these abstractions work (did you have to understand how GKO_REGISTER_OPERATION is implemented to write new kernels?).

I see this issue more related with tools, and them being confused with symbols that appear out of thin air, and not so much about having understandable code. That's why I don't see how a comment would help. Developers need to have a basic understanding of what each feature used in the code does anyway to make sense of it - and that means they will read the documentation where it will also say which symbols are generated by the macro, and what do they represent.

P.S. I assume what you wanted to add was a comment like this:

This symbol was generated from macro XXXX in XX.cpp

gflegar commented 6 years ago

@tcojean I assume you have a concrete tool which is giving you problems with macros. Just out of curiosity, is that tool capable of finding the definitions of methods exposed through mixins?

For example, if you want to find where does create in a Cg::Factory::create() method come from, will it track it into the create definition in EnableDefaultFactory mixin. And if you want create in Dense::create() will it track it into the definition of EnableCreateMethod?

gflegar commented 6 years ago

Btw, I just tried this in vim with the YCM plugin (it uses clang to provide semantics). In core/solver/cg.hpp, when positioning the cursor over parameters_ on line 123 and issuing the :YcmCompleter GoToDefinition command, the editor happily jumps to line 90, which is the invocation of the ENABLE_LIN_OP_FACTORY macro that defines parameters_.

On the other hand, the tool is not able to go to definitions of things defined in templates, like the create() method, so according to my tool, templates, and not macros should be a real issue here.

tcojean commented 6 years ago

Here is some more feedback on tools. I feel that we should be able to understand things reasonably without tools but it's true they help life a lot.

What I use is a simple gnu global (gtags) setup. I also have on the fly compilation/check and other things but they don't interact with the tag system (too complex I guess). Templates are indeed hard to figure out directly, but at least because they are properly named, when you ask for definitions of any templates with a particular name you have a list of proposal and you only have to pick the correct one: example with TemplatedOperation The other way around also works properly (seeing all use of TemplatedOperation for example). It's not properly filtered with the correct scope but that's fine, you can do the rest of the job by hand. At least, this acts as a smart grep showing only definitions for a name. Due to macro generated names we can't rely on this behavior though.

If YCM(D) is able to cope with that and take care of doing a preprocessor pass transparently, I will look out of curiosity if it can be made to work properly on emacs with the same functionalities.

About your specific question on create , in all cases gtags is able to correctly propose all possibly relevant options (short list, it only finds potential definitions) therefore you only have to quickly go through them and pick the correct one.

tcojean commented 6 years ago

I might add if there is any emacs user, I found rtags which does works quite well: 1) it finds macro generated names properly and 2) deals decently with templates. There are some problems though, it still has problems with scope like gtags though and it's a (relative) pain to set up. In addition, there are bugs in the master branch, so use the last release branch!

gflegar commented 6 years ago

BTW, I think this is the reason why go to definition doesn't work with templates in YCM. So it seems it can be made to work by changing a single option in the source and recompiling, if we don't mind the performance problem.

From what I quickly read about rtags, it seems it's basically an equivalent of YCM for emacs (they both use libclang to reason about the code in the background).

A c/c++ client/server indexer for c/c++/objc[++] with integration for Emacs based on clang.

If I s/Emacs/Vim this gives a perfectly good description of (a portion of) YCM. It does have some more features and languages - e.g. you can write your own semantic completers for languages that are not supported by clang, and there are already some of those available.

tcojean commented 4 years ago

@flipflapflop this issue discusses the code generation macros. You will find some explanations of this and tentative solutions which looked more annoying than what we have. In general, we are open to moving to another format, just we did not find anything proper so far.