dendibakh / perf-challenge6

28 stars 7 forks source link

Problem definition #2

Open chadbrewbaker opened 2 years ago

chadbrewbaker commented 2 years ago

The output of this sort is correct? Just making sure I understand the problem definition.

tr -s ' \t' '\n' < data/small.data | sort | uniq -c | sort -ns

As a hilarious aside - I think I found some issues in /usr/bin/sort on OSX https://opensource.apple.com/tarballs/text_cmds/ 😂

-- update -- This is close - still need to secondary alpha sort

import sys
from collections import Counter

fpath = sys.argv[1]

with open(fpath, 'r') as f:
    data = f.read()

freq = Counter(data.split())

result = freq.most_common()
dendibakh commented 2 years ago

For bash I think you can use this one as a reference: https://github.com/juditacs/wordcount/blob/master/bash/wordcount.sh

I checked that the baseline output matches with this solution: https://github.com/juditacs/wordcount/blob/master/python/wordcount_py3.py

chadbrewbaker commented 2 years ago

Thanks. I think this is the correct Python using the Counter class. Runs about 2x as slow as the original C++ on my M1.

import sys
from collections import Counter

fpath = sys.argv[1]
with open(fpath, 'r') as f:
    data = f.read()
freq = Counter(data.split())
result = sorted(freq.most_common(), key=lambda x: (-x[1], x[0]))