Handle different file endings for source files

PRUNERS / FLiT

A project to quickly detect discrepancies in floating point computation across hardware, compilers, libraries and software.

Other

36 stars 6 forks source link

Handle different file endings for source files #253

Closed mikebentley15 closed 5 years ago

mikebentley15 commented 5 years ago

Feature Request

Describe the new feature: Currently, you can add source files for C++ into the custom.mk file. However, if any of those source files do not end with .cpp, then it will fail to function properly. This is because the generated Makefile uses generic rules for specifying how to compile the source files, therefore only %.o: %.cpp rules exist.

This is a request to be able to handle other file endings, such as .cc, .cxx, or even .c for C++ source files. This becomes increasingly important when supporting other languages such as FORTRAN (see [#93]) or C.

Suggested change: I recommend that we move away from using Makefile generic rules, and move toward making individual rules on a file-basis based on belonging to a particular Makefile variable. I think that trying to support any and all file endings for a particular language is a losing battle, especially if individuals use non-standard file endings. Our tool should work no matter what the file ending is, as long as they are added to the correct variable.

There is an exception to such a rule, and that is with the test source code itself. Currently, the tests are added to the SOURCE variable using $(wildcard tests/*.cpp) and $(wildcard *.cpp). In these cases, there are a few choices for source files within the test directory propper:

Try to support all common file endings
Assume all files are source files (I don't like this one)
Require the user to put these files into custom.mk (it would be prepopulated with a wildcard declaration when created)

The first one would be the most "magical" for users to just work. But I'm not actually in favor of that kind of behavior in a tool such as this one. The second one is definitely a non-starter, since there may be data files put in the tests directory that are not source files, or even header files that would not compile themselves. The third one I think is the best solution. However, the third one has a backward-compatibility problem. I think to solve that compatibility problem, we could simply keep the same wildcard declarations in the generated Makefile as well as in custom.mk (a little bit of duplicate functionality that will be documented in the generated Makefile.

Alternative approaches: Try to make rules for all different types of common file endings. This is doable, but will it really work for all cases? When the .cc file ending was not supported for the LULESH project, I ended up creating symbolic links that ended in .cpp to those files and including those in the FLiT tool. This was hardly elegant. If we try to support all common file endings, then likely we will miss some, especially when we start supporting other languages. Furthermore, sometimes C++ files end in .c, which is a common ending in C files. How then would we distinguish when a source file is C++ and when it is C? I think any maintainable long-term solution must utilize the value of the SOURCE Makefile variable(s).

jjgarzella commented 5 years ago

I think moving away from generic rules is a good solution to this problem. I have one suggestion, but for context I'm going to quote Micheal in a recent email:

"The idea is that we move away from generic Makefile rules (e.g., "%.o: %.cpp") and instead specifying a unique rule for each file specified in the SOURCE variable, much like how QMake and CMake do in their generated Makefiles. But, since we do not have access to the list of source files when we create the Makefile, we would likely need to do a foreach rule on each file specified in SOURCE. Very much how the RECURSION_RULE is defined and implemented in a foreach."

What if, instead of doing this work in make, we remove the SOURCE variable entirely from custom.mk and fold it into the flit-config.toml? The SOURCE variable is just a list of file names, so it can be specified in the toml, probably at the top level. Then, we would have a list of source files and we can just generate the rules with Python.

An advantage to this method is that it allows us to easily fix the problem of tests: by default, the Python code can add all files matching tests/*.cpp and *.cpp to its list of rules to generate, which replicates current behavior. Then, we can add an optional configuration to the toml file to enable/disable certain tests.

One disadvantage to this method is that it removes the possibility of using things like wildcards, adding a directory prefix in a Make variable, or other Make shenanigans. I don't think wildcards are that important, my experience is that every time I use FLiT, I end up either copying a list of files into custom.mk. For directory prefixes, we could manually add this as another config option in the toml file. We could also optionally let people specify a line of make code which could execute a wildcard. I'm definitely open to suggestions on how to deal with these issues.

Overall, I feel that this is a really good opportunity to move some functionality from custom.mk into flit-config.toml. Things in flit-config.toml are generally much easier for a non-expert to use, and I think it would move us towards a goal of having custom.mk be unnecessary in many basic use cases. This would improve usability, and help extend the reach of FLiT.

Edit: fixed typo

mikebentley15 commented 5 years ago

You make a good case JJ. I'll have to think about it. Perhaps if you want to discuss in person, we could do that as well.

You are correct that both flit-config.toml and custom.mk serve similar purposes to configure the test environment. I have tried to keep the distinction between them clear, where flit-config.toml configures the search space (e.g., which compilers, which flags, MPI settings, etc.) for testing, and custom.mk specifies exactly what is needed to compile the tests.

Having the source file specification (as well as required compiler flags) within the Makefile has the following benefits I can currently think of:

You can use wildcard to specify a large group of files within a single declaration
Perhaps you can leverage existing Makefile infrastructure your project may possess with includes
You can get lists of files or compiler flags from other executables (such as $(shell mpic++ --showme:compile))

The flit-config.toml approach cannot fulfill any of these benefits (except we could support wildcards in the toml file which is then expanded by glob.glob() in python).

I'm not yet sure if those benefits are worth having this extra custom.mk configuration file. If we move too much into flit-config.toml, we run the risk of creating our own meta-make system such as QMake or CMake, which I do not want.

@IanBriggs what are your thoughts?

IanBriggs commented 5 years ago

Won't we have to create our own meta-make system for the idea where we capture a compilation then replay it?

On Wed, Mar 6, 2019 at 11:28 AM Michael Bentley notifications@github.com wrote:

You make a good case JJ. I'll have to think about it. Perhaps if you want to discuss in person, we could do that as well.

You are correct that both flit-config.toml and custom.mk serve similar purposes to configure the test environment. I have tried to keep the distinction between them clear, where flit-config.toml configures the search space (e.g., which compilers, which flags, MPI settings, etc.) for testing, and custom.mk specifies exactly what is needed to compile the tests.

Having the source file specification (as well as required compiler flags) within the Makefile has the following benefits I can currently think of:

You can use wildcard to specify a large group of files within a single declaration

Perhaps you can leverage existing Makefile infrastructure your project may possess with includes

You can get lists of files or compiler flags from other executables (such as $(shell mpic++ --showme:compile))

The flit-config.toml approach cannot fulfill any of these benefits (except we could support wildcards in the toml file which is then expanded by glob.glob() in python).

I'm not yet sure if those benefits are worth having this extra custom.mk configuration file. If we move too much into flit-config.toml, we run the risk of creating our own meta-make system such as QMake or CMake, which I do not want.

@IanBriggs https://github.com/IanBriggs what are your thoughts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PRUNERS/FLiT/issues/253#issuecomment-470220547, or mute the thread https://github.com/notifications/unsubscribe-auth/ACD1eLH6EdrWdAHGWisTlz_rdomtl0Hwks5vUAjBgaJpZM4bflVV .

mikebentley15 commented 5 years ago

I wouldn't necessarily call that a meta-make system. It's not like we would be creating syntax for arbitrary build rules of arbitrarily complex applications. Instead, we would capture the behavior of existing build systems and convert them into Make to be able to play it back, in a sense.

But Ian brings up a good point. The idea behind the capture-playback scenario was to only need to generate the custom.mk file, but your flit-config.toml would not need to be touched at all. It supports the idea of keeping them distinct, where custom.mk is used only to specify how your tests will be compiled, which can be autogenerated from an existing build system, or written from scratch. If we make a capture-replay system work quite seamlessly, then I could see that most users would never need to touch the generated custom.mk except to perhaps regenerate it if the application source has changed.

jjgarzella commented 5 years ago

From my perspective, I'm trying to think about usability of FLiT, and right now the main pain point (at least for me) to using FLiT has been having to essentially recreate the build system of whichever library I want to test into custom.mk. In order to properly debug this, it often requires knowledge of the generated Makefile. Because the generated Makefile is basically part of the implementation of FLiT, this is an obstacle to adoption. A big part of my logic behind wanting to fold SOURCE into flit-config.toml is that anything in the toml file won't ever require knowledge beyond the documentation.

However, if the capture-playback functionality works properly, there's no need for the average user to touch custom.mk anyways, so that point is moot. If that's where we're headed, then maybe switching everything away from custom.mk is less of a priority.

The only other thing is a philosophical question: would we rather be doing these recursive make rules and everything in make or in Python? Normally, I would say that Python is a lot better suited for that sort of work and I would rather have the generated Makefile be a really flat file with almost no logic in it, like the ninja build system does things. Yes, this means that we're writing a pseudo-Cmake. But we almost have a psuedo-Cmake already, it's just written in Make using recursive rules.

However, I don't know if this is the right choice, because I don't think completely changing the nature of the project is an especially good idea. FLiT already works really well, and that philosophical question is entirely an implementation detail if capture-playback gets up and running.

mikebentley15 commented 5 years ago

JJ, I understand what you're getting at here. Yes, we are planning on moving toward auto-generation of custom.mk, in which case, if we move the source files or anything else into flit-config.toml, it will make that task much more complicated. I agree that the creation of custom.mk is the main hurdle towards the adoption of FLiT. For that reason, the autogeneration of this file is probably the very next thing we will work towards.

As for the recursive Makefile calls, I feel you may not understand the reason those were introduced. Originally, it did not have recursive calls, and the result was that every time we called any sort of make command, be it make gt, make dev, or even make help, it took so long (because of the sheer size after expanding all source files, object files, rules, executables, and test output files), that you waited for something like 10 seconds before anything happened, for a small project such as MFEM. Think about enumerating all object files, executables, and output files for approximately 250 different compilations.

The recursive Makefile rule is a bit complicated, but it is intended to be a lazy expansion to reduce parsing runtime. If we were to make it a flat structure, it would probably take a long time for any sizable project. If you disagree, I welcome proof of concept. Maybe the ninja build system can efficiently handle a flat structure for a very large amount of target files (PS, I love Ninja build), but I do not believe that GNU Make can.

mikebentley15 commented 5 years ago

I appreciate you bringing up architectural questions. It is absolutely the right time to be asking these kinds of questions. If we always feel it is not the right time to do so, refactoring will never happen, and code bases stagnate. Refactoring is a necessity to keep projects alive and healthy.

Keep up the good questions. And my word is not "final", I always welcome debate. :+1:

ganeshutah commented 5 years ago

This is such an excellent discussion even for me who doesn't do these things but can appreciate how a flit_like tool pushes the envelope. I argue there is a paper here for many a conference. .. I have seen papers of this kind. My position is to not dismiss these as a painful journey to get somewhere but a journey that is worth analyzing and scientifically publishing about as well!

On Wed, Mar 6, 2019, 10:55 PM Michael Bentley notifications@github.com wrote:

I appreciate you bringing up architectural questions. It is absolutely the right time to be asking these kinds of questions. If we always feel it is not the right time to do so, refactoring will never happen, and code bases stagnate. Refactoring is a necessity to keep projects alive and healthy.

Keep up the good questions. And my word is not "final", I always welcome debate. 👍

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/PRUNERS/FLiT/issues/253#issuecomment-470395766, or mute the thread https://github.com/notifications/unsubscribe-auth/AEyiCjbT4LKzVspwu-yIsN-_ZBm3NMQUks5vUKm3gaJpZM4bflVV .

jjgarzella commented 5 years ago

Actually, I'm convinced for the recursive-rule pattern now, because 1) after looking at the Makefile.in more closely, we wouldn't be making another recursive loop, really we're modifying the current one (and possibly changing the dev/gt lists into their own loops) and 2) I didn't realize that Make gets slower when you flatten it out and have a big file. It goes to show the differences between Make and Ninja, I guess. And again, the fact that we're looking at the capture-playback stuff removes the major problem which is usability.