Create a (automatic?) process for updating the benchmark results

dimroc / etl-language-comparison

Count the number of times certain words were said in a particular neighborhood. Performed as a basic MapReduce job against 25M tweets. Implemented with different programming languages as a educational exercise.

187 stars 33 forks source link

It's a great idea and I have thought of automating the calculation of each implementation's runtime. I will be manually updating the README later this week.

We have come to a point though where many of these implementations are no longer apples to apples comparisons. They vary in some subtle and not so subtle ways:

Holding contents in memory rather than streaming in lines
Using regex or using substring
Handling ASCII or Unicode
Others.

This repo is becoming more a source of idiomatic implementations rather than a fair speed comparison. I'll be including this information in the README.

Your idea is still very valid though, we could still use an automatic way of running and tracking benchmarks across languages. It could spur more community involvement.

dimroc / etl-language-comparison

Create a (automatic?) process for updating the benchmark results #18