chatziko / lci

A lambda calculus interpeter
https://www.chatzi.org/lci/
GNU General Public License v2.0
64 stars 7 forks source link

Draft version of string interning #4

Closed baziotis closed 5 years ago

baziotis commented 5 years ago

This is a draft version of string interning done roughly only for the aliases. (String interning lets us insert strings only once in a data structure which gives us pointers that for equal strings are equal and so we can do cheap comparisons between strings).

chatziko commented 3 years ago

I felt like playing with lci after a long time and modernize a bit the codebase, so I took the time to merge this and extend it to all other uses of strings.

The speedup of interning variables was quite substantial (at least a 2x speedup on the whole execution), since they're constantly being copied/compared during execution. Thanks for the idea!

baziotis commented 3 years ago

Hey, glad it had that speedup :) I don't remember what I had picked to intern in this patch, probably it was random (?).

You might like the idea of using a hash table for the comparison of strings instead of a vector. I'm not sure what is the data coming into lci. If the strings are few and the variance of their lengths is big, it probably won't make a big difference. But in any case, here's a good and quite simple hash table I've used.

Lines 100-111 are the hash function, which is what you probably care about. The rest of the implementation has a dependency on a custom memory allocator (memory allocation was another thing I remember looking at lci...). That table gave ridiculous speedup to this compiler. It was copied from lcc (which I think was the first to introduce string interning (?)).

chatziko commented 3 years ago

You had interned term aliases, but it was straightforward to extend it to everything else.

The core speedup is interning variables, because at runtime we only compare/copy existing variables, never generate new ones. So all interning is done at "compile" time and we can then just use the pointers. I already use a hash table btw (a user friendly one from the k08 course), but it makes no noticeable difference.

On unrelated news: I'm rediscovering the joy of playing with interpreters :D https://github.com/chatziko/ipli-fast

baziotis commented 3 years ago

You had interned term aliases, but it was straightforward to extend it to everything else. The core speedup is interning variables, because at runtime we only compare/copy existing variables, never generate new ones. So all interning is done at "compile" time and we can then just use the pointers. I already use a hash table btw (a user friendly one from the k08 course), but it makes no noticeable difference.

Makes sense.

On unrelated news: I'm rediscovering the joy of playing with interpreters :D https://github.com/chatziko/ipli-fast I think this is pretty cool! My knowledge on interpreters is not that great and I was thinking of contributing to this project to try a few things (that is, if I find some time :D ).

chatziko commented 3 years ago

think this is pretty cool! My knowledge on interpreters is not that great and I was thinking of contributing to this project to try a few things (that is, if I find some time :D ).

Go for it, for sure I had lots of fun!

Also you might want to try this year's buffer overflow. A small change, but it makes it trickier:

https://github.com/chatziko/pico/commit/026601843ce82f96da7083fab80d93f7b84f9e21

baziotis commented 3 years ago

Also you might want to try this year's buffer overflow. A small change, but it makes it trickier:

chatziko/pico@0266018

Haha, thanks for the memories!