speed up integration with hashing

ribrdb commented 5 years ago

I've observed times of over 4 seconds to evaluate a definite integral when the expression being integrated appears near the end of the integral table. I discovered a description of a clever hashing algorithm to use in an integral table which could potentially speed this up considerably. See Appendix A (starting on page 156): https://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/19680004891.pdf

davidedc commented 5 years ago

Hi Ryan, thanks for the link, yes indeed that would be real nice to have.

I'd probably prioritise work on the matcher routine though... if I remember well it only matches 3 parts at most, via 3 nested loops, which is rather arbitrary. And ideally it should quickly spot atoms from the pattern missing from the input - hence quickly failing the match. In fact I'd bet that that change alone would be probably equivalent to the solution you pointed at (there aren't that many patterns, if they are discarded quickly by a simple visit of the pattern and of the input, plus each pattern can be scanned once on creation, and the input can be scanned only once per integral invocation).

Also, more generally on the theme of performance: Algebrite is a port of a CAS made within a compact LISP interpreter (done in C). Now, while implementing a compact LISP in C is a very clever solution if one wants a small scripting language running on the C runtime (for small calculators, etc.), it doesn't really make sense is one uses the JS runtime, which already provides dynamic memory, GC. So all those stack manipulations and registers that you can still see in the code are just overhead and can go away.

An example of such cleanup is this commit: https://github.com/davidedc/Algebrite/commit/38c1f9ae303508b6b6b27b1fe96a9f5b5aac0605 . A lot more could be done of it - there is no need to use the "artificial" stack when one can just normal parameter passing in function calls (hence using the JS runtime stack).

ribrdb commented 5 years ago

Improving the matcher makes sense (and that paper also has a fascinating section on matching in chapter 3), but I think the hash code would be even easier to implement than changing the matcher. I tried looking at the matcher code initally to see if I could speed it up. The code seems very difficult to understand. Maybe if you could clean up all the lisp stuff it would be a more manageable task, but it still seems pretty large to me.

However the hashing algorithm is pretty simple, and using it is also pretty simple: Compute the hash of the expression you're trying to integrate to look up a bucket with just a few integral patterns. Then run the mater on that smaller set. That's just a couple small additions to the code, no refactoring of the existing matcher is necessary.

I don't understand all the lisp stuff enough to make the changes directly to algebrite. However I implemented the hashing algorithm from the paper in mathjs and ran it against the algebrite integral table: https://repl.it/@RyanBrown5/integral-table-hashing With the unmodified algorithm I got 72 collisions, with a maximum of 7 items in a single bucket. So you'd only need to run the matcher at most 7 times for any integral, instead of up to 157 right now. I also made the small tweak they suggested in the paper and that reduced the collisions to 36, with a maximum of 4 expressions in the same bucket.

davidedc commented 5 years ago

nice! If you feel like making a PR for this I'd gladly put it in.

ribrdb commented 5 years ago

Do you have any documentation on the algebrite AST format? I'd be happy to make the change, but I couldn't figure out how to traverse an expression.

davidedc commented 5 years ago

Sure there is one main place, and a couple of examples.

This is how a general expression is structured (uses the LISP cons/car/cdr nomenclature: https://en.wikipedia.org/wiki/CAR_and_CDR ): https://github.com/davidedc/Algebrite/blob/6df074cdabac1a963b2ebdd09f0ff22a7f203d90/runtime/defs.coffee#L30

(first/rest or head/tail nomenclature would be nicer than car/cdr, however navigation of treesusing variants e.g. caadr, caddr etc etc is useful )

a few lines below you can see the elements of the tree: https://github.com/davidedc/Algebrite/blob/6df074cdabac1a963b2ebdd09f0ff22a7f203d90/runtime/defs.coffee#L55

This is how to find an expression inside another: https://github.com/davidedc/Algebrite/blob/6df074cdabac1a963b2ebdd09f0ff22a7f203d90/runtime/find.coffee#L5

This is how to substitute one expression (tree) with another (tree) https://github.com/davidedc/Algebrite/blob/6df074cdabac1a963b2ebdd09f0ff22a7f203d90/sources/subst.coffee

I hope that's of some help!

davidedc / Algebrite

speed up integration with hashing #83