HIPERFIT / finpar

Parallelisation of financial benchmarks
Other
8 stars 2 forks source link

Redesigning the finpar infrastructure #1

Open athas opened 9 years ago

athas commented 9 years ago

(To eliminate ambiguity, here is the nomenclature: we have a number of benchmarks (currently CalibGA, CalibVolDiff, and GenericPricer), each of which have several data sets (typically Small, Medium, and Large), and several implementations (right now mostly different versions of C) each of which may have several configurations. Running a benchmark consists of selecting a data set and an implementation, and possibly specifying a specific configuration of the implementation.)

Recently, a Martin, Frederik, and myself have been implementing the finpar bechmarks in more diverse programming languages - (streaming) NESL, APL and Futhark, at least. Unfortunately, the current finpar infrastructure is not very easy to work with, and so their work has not been integrated. I have identified the following problems:

I propose the following rough protocol:

The following questions have yet to be answered:

Yet, I think this is a good protocol. It will allow us to build an easy-to-use controller program on top of it, that can automatically generate a bunch of different instantiations with different configurations and data sets, and maybe draw graphs of the results, etc. I estimate that the above could be implemented fairly quickly, and sanity-checked with the extant benchmark implementations.

athas commented 9 years ago

There is another unresolved issue: sometimes, it is convenient to share code across implementations. For example, the various C implementations currently share a bunch of boilerplate code related to parsing. I propose we just handle this in an ad-hoc fashion, maybe with a lib directory somewhere that is pointed to by an environment variable.

athas commented 9 years ago

FINPAR_DATASET is overspecified, as only the input file is needed for instantiation. This should be a FIPAR_INPUT variable instead.

athas commented 9 years ago

When executing run, the current directory must be the instantiation directory.

dybber commented 9 years ago

Looks good. Det individual benchmark implementers should be free to use Makefiles in his instantiation file if he wishes, but the main benchmark-runner should be developed in pure Python. Keep the main benchmark repo low-key, only the simple benchmark runner that outputs flat text-files, and maybe have different projects for visualising benchmark data and generating websites etc.

Why use environment variables over command line arguments?

runtime.txt: remember that you would want to run each benchmark like 100 times to calculate mean and std.dev. I think a better approach will be to output the running time on standard out, and let the benchmark-runner script collect these running times for each run and create the file. Secondly, we found that we liked to compare our timings over time. So it should probably be calibfuthark.result, calibsnesl.txt, to keep old versions and make sure we don't mix up what language we were benchmarking.

I think the file format-debate ended like: the default file format doesn't matter, as long as there is a conversion script to JSON. I will probably make one that converts to CSV, to make it easier to use for R and APL.

athas commented 9 years ago

Martin Dybdal notifications@github.com writes:

Why use environment variables over command line arguments?

This was based on the idea that people might not want to parse command lines, and the environment is already a key-value store. Another solution would be a key-value JSON file.

runtime.txt: remember that you would want to run each benchmark like 100 times to calculate mean and std.dev. I think a better approach will be to output the running time on standard out, and let the benchmark-runner script collect these running times for each run and create the file. Secondly, we found that we liked to compare our timings over time. So it should probably be calibfuthark.result, calibsnesl.txt, to keep old versions and make sure we don't mix up what language we were benchmarking.

These are really good hints, thanks!

How do you propose we deal with repeated execution? Should each benchmark just be expected to repeat the right number of times? I suppose that if it outputs a list of runtimes, we can just complain if the number of entries is not as expected.

(I do not expect people to intentionally cheat in their implementations; this is mostly to guard against bugs.)

I think the file format-debate ended like: the default file format doesn't matter, as long as there is a conversion script to JSON. I will probably make one that converts to CSV, to make it easier to use for R and APL.

As I found out, the current file format is already a perfect subset of JSON.

\ Troels /\ Henriksen

athas commented 9 years ago

I have pushed a branch new-design which incorporates some of the ideas above. Only the CalibVolDiff benchmarks have been fully ported. I have committed to using JSON and Python for everything. Use the finpar program if you want to try it out.

vinter commented 9 years ago

Science storage can easily handle this - if the data don change often (and I would assume not:)) a link in github should do the trick?

/B

On 20 Feb 2015, at 18:00, Troels Henriksen notifications@github.com wrote:

I have pushed a branch new-design which incorporates some of the ideas above. Only the CalibVolDiff benchmarks have been fully ported. I have committed to using JSON and Python for everything. Use the finpar program if you want to try it out.

— Reply to this email directly or view it on GitHub https://github.com/HIPERFIT/finpar/issues/1#issuecomment-75273769.

dybber commented 9 years ago

How do you propose we deal with repeated execution? Should each benchmark just be expected to repeat the right number of times? I suppose that if it outputs a list of runtimes, we can just complain if the number of entries is not as expected.

I think it should be the job of the benchmarking script to repeat the process and collect the reported timings in a file.

athas commented 9 years ago

What is 'the benchmarking script'?

athas commented 9 years ago

I have been thinking that instead of run/instantiate scripts, maybe a Makefile with well-defined targets is more familiar.

dybber commented 9 years ago

+1

We decided to use such a setup for our 'aplbench' a while ago. Look at this branch: https://github.com/dybber/aplbench/tree/make-setup

athas commented 9 years ago

Martin Dybdal notifications@github.com writes:

+1

We decided to use such a setup for our 'aplbench' a while ago. Look at this branch: https://github.com/dybber/aplbench/tree/make-setup

I think using a Makefile for the top-level script is bad software engineering. It makes it way too hard to do any kind of real analysis and programmatic configuration, and it makes the entire system less flexible and extensible - consider the added difficulty of adding a new implementation or benchmark.

\ Troels /\ Henriksen