HPCE / hpce-2016-cw5

0 stars 2 forks source link

HPCE 2016 CW5

Updates to Julia

As noted in my email, I failed to push the reference specification to master, which left you in limbo if you don't know floating-point well and were trying to get bit-accurate results in OpenCL.

So for Julia it will be up to me (David) to manually check whether your implementation matches a possible valid refinement of the original spec into IEEE floating-point. Make sure you include some notes about testing in your readme document (though you probably want to include something on that anyway).

Sorry!

Specification

You have been given the included code with the goal of making things faster. For our purposes, faster means the wall-clock execution time of puzzler::Puzzle::Execute, across a broad spectrum of scale factors. The target platform is an AWS GPU (g2.2xlarge) instance, and the target AMI will be the public HPCE-2015-v2 AMI. The AMI has OpenCL GPU and software providers installed, alongside TBB. You can determine the location of headers and libraries by starting up the AMI.

Note that there are AWS credits available for you for free from the Amazon Educate programme - you shouldn't be paying anything to use them. You may want to look back over the AWS notes from CW2.

Pair-work

This coursework is intended to be performed in pairs - based on past student discussions these pairs are organised by the students themselves. Usually everyone has already paired themselves up, but if anyone is having difficulty, then let me know and I may be able to help. Singletons are sometimes required (e.g. an odd sized class), in which case I make some allowances for the reduced amount of time available during marking. Trios are also possible, but things go in the other direction (and trios are generally much less efficient).

To indicate who you are working with, each member of the pair should give write access to their hpce-2016-cw5-login repository to the other person. You should have admin control over your own account, so in the github page for the repo got to Settings->Collaborators and Teams, or go to:

https://github.com/HPCE/hpce-2016-cw5-[LOGIN]/settings/collaboration

You can then add you partner as a collaborator.

During development try to keep the two accounts in sync if possible. For the submission I will take the last commit in either repository that is earlier than 22:00.

Meta-specification

You've now got some experience in different methods for acceleration, and a decent working knowledge about how to transform code in reasonable reliable ways. This coursework represents a fairly common situation - you haven't got much time, either to analyse the problem or to do low-level optimisation, and the problem is actually a large number of sub-problems. So the goal here is to identify and capture as much of the low-hanging performance fruit as possible while not breaking anything.

The code-base I've given you is somewhat baroque, and despite having some rather iffy OOP practises, actually has things quite reasonably isolated. You will probably encounter the problem that sometimes the reference solution starts to take a very long time at large scales, but the persistence framework gives you a way of dealing with that.

Beyond that, there isn't a lot more guidance, either in terms of what you should focus on, or how exactly it will be measured. Part of the assesment is in seeing whether you can work out what can be accelerated, and where you should spend your time.

The allocation of marks I'm using is:

Deliverable format

The reason for all this strange indirection is that I want to give maximum freedom for you to do strange things within your implementation (example definitions of "strange" include CMake) while still having a clean abstraction layer between your code and the client code.

Floating-point Implementation

So some notes on floating-point, mainly applying to Julia:

However:

For the case of Julia, I added some reference inputs, and modified the makefile so that you can do:

make check_julia

To check your implementation against it. I also tightened up the reference implementation so it loses any possible ambiguity (becomes platform portable, rather than dependent on the C++ implementation).

Intermediate Testing

I'll be occasionally pulling and running tests on all the repositories, and pushing the results back. These tests do not check for correctness, they only check that the implementations build and run correctly (and are also for my own interest in seeing how performance evolves over time) I will push the results into the dt10_runs directory.

If you are interested in seeing comparitive performance results, you can opt in by commiting a file called dt10_runs/count_me_in. This will result in graphs with lines for your implementation versus others who also opted in, but you will only be able to identify your line on the graph graph.

I will pull from the "master" branch, as this reflects better working practise - if there is a testing branch, then that is where the unstable code should be. The master branch should ideally always be compilable and correct, and branches only merged into master once they are stable.

Finally, to re-iterate: the tests I am doing do no testing at all for correctness, they don't even look at the output of the tests.