dimroc / etl-language-comparison

Count the number of times certain words were said in a particular neighborhood. Performed as a basic MapReduce job against 25M tweets. Implemented with different programming languages as a educational exercise.
http://blog.dimroc.com/2015/11/14/etl-language-showdown-pt3/
187 stars 33 forks source link

perl versions, plus README with some comments and thoughts #22

Closed sitaramc closed 8 years ago

sitaramc commented 8 years ago

Hi,

I found this on an Elixir related site, and took a look. To be honest, it doesn't really play to Elixir's strengths. In fact it looks tailormade for perl, so I took a crack at writing a perl version, looking at a few of the others, and so on.

If you think the README has too much pontificating, please feel free to drop what you don't like, or let me know which sections you want me to remove and I will resubmit the PR.

mganss commented 8 years ago

The shell one-liner can be faster: Try LC_ALL=C grep -FUi (or even faster with GNU Parallel, see #21). Regarding Unicode, I'm not sure if you can definitely say that the Python version is correct and the others aren't. I think it depends on the locale you are using, see https://en.wikipedia.org/wiki/Dotted_and_dotless_I

dimroc commented 8 years ago

Wow, I only just realized the README @sitaramc. This is great, I'll try to incorporate some of it into my blog post.

Thanks.

RE: https://github.com/sitaramc/etl-language-comparison/blob/master/perl/README.md

sitaramc commented 8 years ago

On 15/11/15 06:52, Dimitri Roche wrote:

Wow, I only just realized the README @sitaramc https://github.com/sitaramc. This is great, I'll try to incorporate some of it into my blog post.

Thanks.

RE: https://github.com/sitaramc/etl-language-comparison/blob/master/perl/README.md

— Reply to this email directly or view it on GitHub https://github.com/dimroc/etl-language-comparison/pull/22#issuecomment-156769000.

:-)