SimplifyType is too expensive

attic-labs / noms

The versioned, forkable, syncable database

Apache License 2.0

7.45k stars 267 forks source link

SimplifyType is too expensive #3747

Open ghost opened 6 years ago

ghost commented 6 years ago

Now that the dust is starting to settle after changing values to be backed by bytes, SimplifyType is rising the top of alot of profiles.

The algorithm is unavoidably costly. I think a good kind of approach here is to try to avoid recomputing a simplified type that we've already computed a bunch (e.g. when creating large collections)

ghost commented 6 years ago

Thinking about this more, I think I can imagine three approaches to mitigating the cost of simplify type:

1) Find some way to implement which is less costly in terms of number of memory allocations 2) Memoize the work so that we don't keep recomputing the same simplified type for the same (or very similar) input types 3) Avoid doing it at all if the type doesn't need to be simplified

ghost commented 6 years ago

I think maybe a good first place to start is (3) above. I think there's probably a 60-80% base case here which is that SimplifyType gets called everytime the sequence chunker produces a new node in a prolly tree. What gets simplified is all of the elements of the sequence. It's probably more often than not that

a) Each element is exactly the same type b) Any given element is already Simplified

I think simply detecting this case and avoiding the call to SimplifyType is a good place to start and will help with the majority of cases.