JeffersonLab / qphix

QCD for Intel Xeon Phi and Xeon processors
http://jeffersonlab.github.io/qphix/
Other
13 stars 11 forks source link

Make -j is not inherited properly #71

Closed martin-ueding closed 6 years ago

martin-ueding commented 7 years ago

Martin

When one calls make -j, it will start a job server and then other make processes can inherit this and coordinate the number of processes running. By explicitly giving the option -j on the sub make, this behavior seems to be suppressed.

Bálint

Hi Martin, My problem here was that on my linux systems at least, if I did not give a -j option, sub makes would run with only 1 thread. NB: This was not true on my mac, but only on my linux nodes. This is why I gave the recursive_jN (qphix) or target_jN (qphix_codegen) CMake option, so I could make sub-builds build with that value of -j It seems clunky to me. What is the best way? to give -j but no value?

Best, B

Martin

I am not sure. Just giving -j is not a good idea, that will spawn an unlimited amount of processes. There are around 500 compilation units each needing 300 MB of RAM. This is something that I can look into at some point, I just wanted to file an issue such that it does not get lost. The recursive_jN works fine for the moment, so let's just keep using that for now.

Bálint

Annoyingly, this is another feature of the fact that we have a sub "cmake; make” It seems the top level make environment is not passed down (inluding jobserver details). Incidentally I have no idea of how the jobserver is meant to work in actuality. This is somewhere I think where we can spend a lot of time digging, with maybe not so fruitful results. I agree to stick with the recursive_jN unless anyone suggests better.

Best, B

Martin

The sub-make has to be invoked with cd subdir && $(MAKE) in the Makefile in order to get the job server stuff right. I assume that it passes a path to some UNIX socket for the communication. If you just call cd subdir && make, then the job server will not work correctly. Since our Makefiles are generated by CMake, that is another matter.

Even worse, nobody guarantees that we are using make. I prefer ninja to build my stuff because it handles output of multiple processes better. So I sometimes pass -G ninja to cmake. This should be passed along as well.

I might dig into that when I want to do some productive procrastination.

martin-ueding commented 7 years ago

There is another caveat here: The generator is not passed down properly either. I like to use Ninja to build my stuff because it handles the output much better than Make when using multiple processes. The top level CMake file calls to ${CMAKE_MAKE_PROGRAM} to compile the code generator. That is nice, but one has to pass the -G flag down to that inner call to cmake as well.

martin-ueding commented 7 years ago

Now the generator is passed down properly and the -j flag is only passed down if it is actually given. The Ninja build system does not need that, it will automatically spawn N+2 processes given the available cores.

kostrzewa commented 7 years ago

@martin-ueding @bjoo I just wanted to request feedback from you, @bjoo, regarding changes to the devel branch. I'm guessing you still have some modifications to push (at some point) for your MG implementation. Similarly, I'm pushing mods and additions to get non-degenerate tm clover quarks fully supported. At this stage, I would prefer if devel remained somewhat stable and that any improvements be made in feature branches which can then be pulled in as required. This may mean some merge conflicts down the line but it might also prevent the situation of having to constantly rebase onto or merge the devel branch into a WIP branch. What do you think @bjoo?

bjoo commented 7 years ago

HI All, I am working on a branch called mg_mods I was planning to merge this onto devel later. I would also like to keep things stable. Best, B

On Jul 18, 2017, at 6:30 AM, Bartosz Kostrzewa notifications@github.com wrote:

@martin-ueding @bjoo I just wanted to request feedback from you, @bjoo, regarding changes to the devel branch. I'm guessing you still have some modifications to push (at some point) for your MG implementation. Similarly, I'm pushing mods and additions to get non-degenerate tm clover quarks fully supported. At this stage, I would prefer if devel remained somewhat stable and that any improvements be made in feature branches which can then be pulled in as required. This may mean some merge conflicts down the line but it might also prevent the situation of having to constantly rebase onto or merge the devel branch into a WIP branch. What do you think @bjoo?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.


Dr Balint Joo High Performance Computational Scientist Jefferson Lab 12000 Jefferson Ave, Suite 3, MS 12B2, Room F217, Newport News, VA 23606, USA Tel: +1-757-269-5339, Fax: +1-757-269-5427 email: bjoo@jlab.org

martin-ueding commented 6 years ago

What is the problem with merging the devel branch into the feature branches? That will ensure that the merge goes through cleanly.

I have the impression that we are having a couple of features in features branches now that might run for the next couple of months. The feature to be merged into an unchanged devel branch will be a fast-forward, the next feature might be as painful as the strong-scaling stuff that I just fiddled in. And those are not even landed in devel. Who will be responsible to resolve the merge conflicts in the end?

kostrzewa commented 6 years ago

What is the problem with merging the devel branch into the feature branches? That will ensure that the merge goes through cleanly.

This generally works and should be encouraged, of course. But keep in mind that QPhiX here is likely just a tool, the largest chunk of work is taking place in some external application, Chroma or tmLQCD, say. Now, every time you pull stuff in from the QPhiX devel branch (let's say, daily, if it changes at the pace it has been changing over the last few months), you have to test and retest for regressions or interface changes. You likely have to do this on several architectures and this becomes very hard to automate for code which is itself changing substantially over time... As a result, I believe that it is somewhat important to have a stable-ish devel branch from time to time, even if this means having to do some octopus-merge magic when the dust has settled.

Who will be responsible to resolve the merge conflicts in the end?

The feature branch author has to merge or rebase onto the current devel. Depending on the size of the changes in their feature branch, this person can be given priority to "be the first", such that they have the least amount of work. We're not talking about hundreds of branches, after all.