Standardize algorithms - Githubissues

dimroc / etl-language-comparison

Count the number of times certain words were said in a particular neighborhood. Performed as a basic MapReduce job against 25M tweets. Implemented with different programming languages as a educational exercise.

187 stars 33 forks source link

I see that contributions have taken different approaches for solving the same problem, so at the end the benchmark is no comparing the language itself.

My suggestion would be to set a guideline for contributing which explains the standard approach, like:

It should use files
Should have the amount of worker/threads to use as a parameter
Can buffer for writing but the buffer size has certain size limit.
Should use regular expressions or should include both versions: with and without regexps.

Maybe also allow submitting a non-standard approach that takes advantage of specific language features but keep that one marked as the special one.

So at the end it would be two sets of solutions: (1) the standard that follows the rules and (2) the optimized or non-standard.

dimroc / etl-language-comparison

Standardize algorithms #24

Rules of Reference Implementation