HPCE / hpce-2017-cw5

1 stars 6 forks source link

HPCE 2017 CW5

Errata

Specification

You have been given the included code with the goal of making things faster. For our purposes, faster means the wall-clock execution time of puzzler::Puzzle::Execute, across a broad spectrum of scale factors. There is also an emphasis on good scaling - how large a scale parameter can be executed?

The set of driver programs included are fairly basic, and have no particular importance beyond making it easy to call Execute (testing will use a different driver program). You can infer how they work from the source; look at the serenity_now makefile targe; or there is some brief guidance here.

The target platform is an AWS GPU (g2.2xlarge) instance. I would prefer to use a bigger instance, but this is the most economical, and I don't want people running out of money here. I will probably do some short runs on the submitted versions with a bigger instance (e.g. a p3.16xlarge, at $27 an hour...), just out of interest, but they will not be assessed.

The target AMI will be the public HPCE-2017-GPU-Image AMI. The AMI has OpenCL GPU and software providers installed, alongside TBB. You can determine the location of headers and libraries by starting up the AMI.

People working in triples must provide a solution to all six puzzles.

People working in pairs must choose four. The chosen puzzles will be indicated in the documentation.md file, and the choice must be made by the pair. If a pair has not made a clear choice of four for the final submision, then four will be picked at random.

Triples have the advantage of knowing they need to do all puzzles, so there is no paralysis of choice, but they have to do everything. Pairs have the opportunity to attack more than four and try to pick the best performing or "easiest", but may then not be able to probe as deeply. To the best of my estimation there is no inherent advantage one way or the other - the only thing that matters is the ability of the people in the team, and not the choice of 2 or 3.

Meta-specification

You've now got some experience in different methods for acceleration, and a decent working knowledge about how to transform code in reasonable reliable ways. This coursework represents a fairly common situation - you haven't got much time, either to analyse the problem or to do very low-level optimisation, and the problem is actually a large number of sub-problems. So the goal here is to identify and capture as much of the low-hanging performance fruit as possible while not breaking anything. If time allows you can dig deeper, but you first want to be focussing on easy initial wins, and looking for any clear asumptotic improvements (if they exist).

The code-base I've given you is somewhat baroque, and despite having some rather iffy OOP practises, actually has things quite reasonably isolated. You will probably encounter the problem that sometimes the reference solution starts to take a very long time at large scales, but the persistence framework gives you a way of dealing with that.

Beyond that, there isn't a lot more guidance, either in terms of what you should focus on, or how exactly it will be measured. Part of the assesment is in seeing whether you can work out what can be accelerated (using parallelisation, restructuring, and optimisation), and also seeing if you can focus your efforts on the write parts.

The allocation of marks I'm using is:

Deliverable format

The reason for all this strange indirection is that I want to give maximum freedom for you to do strange things within your implementation (example definitions of "strange" include CMake) while still having a clean abstraction layer between your code and the client code.

Intermediate Testing

In the second week I'll be occasionally pulling and running tests on all the repositories, and pushing the results back. These tests do not check for correctness, they only check that the implementations build and run correctly (and are also for my own interest in seeing how performance evolves over time) I will push the results into the dt10_runs directory.

If you are interested in seeing comparitive performance results, you can opt in by changing the line count-us-in from no to yes in documentation.md. This will result in graphs with lines for others who also opted in. There is no specific relationship between the median and any assesment metric. The scripts will attempt to run all puzzles, even for those in pairs.

I will pull from the "master" branch, as this reflects good working practise - if there is a testing branch, then that is where the unstable code should be. The master branch should ideally always be compilable and correct, and branches only merged into master once they are stable.

Finally, to re-iterate: the auto tests I am doing do no testing at all for correctness; they don't even look at the output of the tests.

Submission

The code in github forms your submission, though you must submit your final hash via blackboard for time-keeping and non-repudiation purposes. Pushes to github after the deadline will not be treated as submissions, unless the new hash is also submitted after the deadline.