brazilofmux / tinymux

TinyMUX
www.tinymux.org
48 stars 21 forks source link

Munge is O(n*m) #250

Closed brazilofmux closed 9 years ago

brazilofmux commented 9 years ago

Original issue 247 created by brazilofmux on 2007-01-03T00:39:54.000Z:

The underlying algorithm used in Munge is: for (i = 0; i < nresults; i++) for (j = 0; j < nptrs1; j++) if (!strcmp(results[i], ptrs1[j])) { safe_str(ptrs2[j],buff,bufc) ptrs1[j][0]='\0'; }

This is obviously expected O(nresults*nptrs1). This could be reduced to expected linear time complexity instead of quadratic.

brazilofmux commented 9 years ago

Comment #1 originally posted by brazilofmux on 2007-01-03T08:10:51.000Z:

I've put together a hash table version in dev_alierak; see revision 628.

Someone should actually do some profiling?

brazilofmux commented 9 years ago

Comment #2 originally posted by brazilofmux on 2007-01-05T00:04:45.000Z:

&sort_alpha me=%0 say [setq(0,secs(,7))] [null(iter(lnum(1,100),munge(sort_alpha,lnum(0,6000),lnum(0,6000))))] [sub(secs(,7),%q0)]

Before:

You say, "1.723398208618164" You say, "1.6362740993499756" You say, "1.661150932312012" You say, "1.6323068141937256" You say, "1.812102794647217"

After:

You say, "0.443601131439209" You say, "0.3750908374786377" You say, "0.3990318775177002" You say, "0.375830888748169" You say, "0.3886880874633789" You say, "0.4116618633270263"

So, about 4 times faster.

brazilofmux commented 9 years ago

Comment #3 originally posted by brazilofmux on 2007-01-05T01:10:57.000Z:

Thanks, that shows about 9 times faster on my Mac. I'll also point out that for very short lists (6 items here, and 1000 iterations) it is somewhat slower to build a hash table:

say [setq(0,secs(,7))][setq(1,lnum(0,6))][null(iter(repeat(x%b,1000),munge(sort _alpha,%q1,%q1)))][sub(secs(,7),%q0)]

Before:

You say, "0.1278090476989746" You say, "0.1279170513153076" You say, "0.1328129768371582" You say, "0.1320178508758545"

After:

You say, "0.1686298847198486" You say, "0.1661438941955566" You say, "0.17458295822143555" You say, "0.1656041145324707"

If you take out the munge() call, the above command consistently yields around 0.00052, so the munge() time and its increase are still significant.

I wonder if it's worth finding a tradeoff point and keeping both algorithms.

brazilofmux commented 9 years ago

Comment #4 originally posted by brazilofmux on 2007-01-05T01:32:07.000Z:

Typically, for optimization, it's better to find a real-world case (a real game, frequently-used idioms, widely-used globals, etc), measure real data, make decisions based on where the game as a whole is spending most of its time, and work back from there to specific functions. In this case, someone looked at the algorithm and decided based on the O().

So, yes, if we are trying to optimize just munge(), we should find a decision point and keep both algorithms. I like the new algorithm, and I'd like a combined algorithm even better, but there is still the reality that this isn't totally grounded in real-world data.

brazilofmux commented 9 years ago

Comment #5 originally posted by brazilofmux on 2007-01-05T01:46:06.000Z:

Marking as Started again because work on this continues.

brazilofmux commented 9 years ago

Comment #6 originally posted by brazilofmux on 2007-01-05T01:54:49.000Z:

Storing list indices (ints) in the hash table is a tiny bit faster since we avoid hashdeleteLEN. See revision 646.

brazilofmux commented 9 years ago

Comment #7 originally posted by brazilofmux on 2007-01-05T06:11:41.000Z:

Ian gives this snippet to reveal a compatibility issue:

&m me=%qo th setq(o,1 3 3 3 3)[setq(i,1 3 3 3 3)][munge(m,%qi,a b c d e)]

Current version outputs a b c d e. New munge outputs a b.

brazilofmux commented 9 years ago

Comment #8 originally posted by brazilofmux on 2007-01-05T11:49:20.000Z:

I coded my own version which avoids using CHashTable as CHashTable seems to be designed for long term tables and construction/destruction time is costly. It also handles duplicity properly AFAIK. Here's the kind of results vs. the CHashTable version (on a 233Mhz Pentium II)...

The tests are: &m me=%qo th setq(i,lnum(1000,1))[setq(o,sort(%qi,n))][setq(l,repeat(-%b,333))][setq(t,secs(,7))][iter(%ql,null(munge(m,%qi,%qo)))][sub(secs(,7),%qt)] th setq(i,lnum(1000,1))[setq(o,sort(%qi,n))][setq(l,repeat(-%b,333))][setq(t,secs(,7))][iter(%ql,null(munge2(m,%qi,%qo)))][sub(secs(,7),%qt)] th setq(i,repeat(-%b,3999))[setq(o,%qi)][setq(l,repeat(-%b,333))][setq(t,secs(,7))][iter(%ql,null(munge(m,%qi,%qo)))][sub(secs(,7),%qt)] th setq(i,repeat(-%b,3999))[setq(o,%qi)][setq(l,repeat(-%b,333))][setq(t,secs(,7))][iter(%ql,null(munge2(m,%qi,%qo)))][sub(secs(,7),%qt)]

The first two test all unique elements and the last two are a torture test of 3999 -'s.

Unique Current: 5.33944010734558 Torture Test: 5.739089012145996

Unique Ian's Munge: 1.0622920989990234 Torture Test: 3.505417108535766

Keep in mind the current version of munge() doesn't handle the torture test case correctly due to the above comment.

For comparison, I used CRC32_ProcessBuffer() instead of my own hackish munge_hash() with the results: Unique CRC32 Ian's: 1.220283031463623 Torture Test: 4.05753207206726

I suspect CRC32_ProcessBuffer() does "too good" of a job for a quick and dirty application like this... though for longer strings it's likely to be at least as quick as if not faster than munge_hash().

For reference, munge_hash(): UINT32 munge_hash(unsigned char* x) { UINT32 h=0; while (_x) h^=(h<<5)+(h>>2)+CRC32_Table[_x++]; return h; }

brazilofmux commented 9 years ago

Comment #9 originally posted by brazilofmux on 2007-01-05T12:12:29.000Z:

More benchmarks... Same as before, but with only 10 elements.

th setq(i,lnum(10,1))[setq(o,sort(%qi,n))][setq(l,repeat(-%b,666))][setq(t,secs(,7))][iter(%ql,null(munge(m,%qi,%qo)),,@@)][sub(secs(,7),%qt)] th setq(i,lnum(10,1))[setq(o,sort(%qi,n))][setq(l,repeat(-%b,333))][setq(t,secs(,7))][iter(%ql,null(munge2(m,%qi,%qo)),,@@)][sub(secs(,7),%qt)] th setq(i,repeat(-%b,10))[setq(o,%qi)][setq(l,repeat(-%b,333))][setq(t,secs(,7))][iter(%ql,null(munge(m,%qi,%qo)),,@@)][sub(secs(,7),%qt)] th setq(i,repeat(-%b,10))[setq(o,%qi)][setq(l,repeat(-%b,333))][setq(t,secs(,7))][iter(%ql,null(munge2(m,%qi,%qo)),,@@)][sub(secs(,7),%qt)]

Unique Current (Small): 0.1282308101654053 Wussy Test: 0.1144881248474121

Unique Ian's (Small): 0.05349993705749512 Wussy Test: 0.0533590316772461

Comparing this to MUX2.6: Unique MUX2.6 (Small): 0.04597306251525879 Wussy Test: 0.04543614387512207

And just for the humor of it, the benchmarks of MUX2.6 on the last post's big 1000-unique and 3999-torture tests: Unique MUX2.6: 20.93640899658203 Torture Test: Beyond MUX2.6's default 60 second lag_limit.

Overall, a 20-fold improvement for lists of 1000 elements and very slightly worse for small lists.

brazilofmux commented 9 years ago

Comment #10 originally posted by brazilofmux on 2007-01-05T16:25:27.000Z:

Oops, yeah, somehow we never tested with duplicates. Would you care to provide a patch?

brazilofmux commented 9 years ago

Comment #11 originally posted by brazilofmux on 2007-01-05T22:49:14.000Z:

> Would you care to provide a patch? Er.. Sure. How? I just installed SVN yesterday. Do I need to be considered a developer by Google Code to submit a patch via SVN?

brazilofmux commented 9 years ago

Comment #12 originally posted by brazilofmux on 2007-01-06T16:29:40.000Z:

Yes, you would, but you're welcome to attach source files or diffs here or email me at the obvious gmail address. I'm mainly interested in not reinventing the wheel if you've already got a better hash table for munge.

brazilofmux commented 9 years ago

Comment #13 originally posted by brazilofmux on 2007-01-06T16:37:06.000Z:

kaganar (Ian) has Google Code access. darkwave (Jake) is going to use dev_jake, and this leaves Ronan and Ian to share dev_ronan for now.

brazilofmux commented 9 years ago

Comment #14 originally posted by brazilofmux on 2007-01-07T03:43:14.000Z:

For what it's worth, CHashTable does not perform measurably faster with Ian's hash function, because the typical string in our tests is so short. Blame the data structures then, I guess.

brazilofmux commented 9 years ago

Comment #15 originally posted by brazilofmux on 2007-01-08T12:57:37.000Z:

Ian sent me his code, too, and I've borrowed some of the ideas in it for revision 688 on dev_alierak. I think it performs reasonably well without the oldMethod tradeoff. Please take a look and run more tests.

brazilofmux commented 9 years ago

Comment #16 originally posted by brazilofmux on 2007-01-09T04:24:24.000Z:

I've yet to see it in action, but the rewrite is brilliant to read. It solves several problems (such as using one structure for everything and having to mask some values during copies when modifying the implied linked lists) in an elegant manner and is also much easier to understand. Though the choice to still run the first list through list2arr() is a bit vexing to me.

brazilofmux commented 9 years ago

Comment #17 originally posted by brazilofmux on 2007-01-09T04:33:33.000Z:

After benchmarking it on the same machine as the figures quoted above are from, it turns out it's slightly faster than my implementation. This actually isn't too surprising since my implementation actually uses a hash table half the size. OTOH, Alierak's rewrite of my rewrite of his rewrite (heh) uses smaller structures for the actual hash (nice job on that, BTW... even got around compilers optimizing structs to be word-bounded, but...) but uses an extra list2arr which takes up another 16KB (I think?) instead of using pointers directly in the hash structure.

brazilofmux commented 9 years ago

Comment #18 originally posted by brazilofmux on 2007-01-09T04:44:57.000Z:

Is it ready to be promoted, yet?

brazilofmux commented 9 years ago

Comment #19 originally posted by brazilofmux on 2007-01-09T21:12:56.000Z:

I think it is ready, as of revision 708.

Ian's right, it is faster not to call list2arr at all, so long as that doesn't require enlarging the hash record struct. I found I could do this by storing short offsets relative to fargs[1] and fargs[2], instead of storing pointers to the split words. I've got it down to zero invocations of list2arr, and even having to call countwords() to determine the hash table size doesn't appear to slow it down.

Runtime should be about the same as revision 688 for large input, and about 60% less for small input. Wow.

brazilofmux commented 9 years ago

Comment #20 originally posted by brazilofmux on 2007-01-09T22:06:18.000Z:

Though you could avoid countwords() by indexing list2 and noting that if list1 has more or less words the munge() will fail, so assuming the munge() is to succeed (which is the only case worth optimizing for) indexing list2, noting how many words are in it, and allocating space for a hash table for that list size is reasonable.

OTOH, at this point, it's probably more than good enough. :)

brazilofmux commented 9 years ago

Comment #21 originally posted by brazilofmux on 2007-01-09T23:33:58.000Z:

Yeah, indexing list2 (as opposed to calling list2arr) does seem a little bit faster. I'm not sure it's worth the added code, but see revision 710 if you want to try it.

brazilofmux commented 9 years ago

Comment #22 originally posted by brazilofmux on 2007-01-10T03:51:44.000Z:

Feel free to tweak further, but this one is well fixed.