attic-labs / noms

The versioned, forkable, syncable database
Apache License 2.0
7.45k stars 267 forks source link

Optimize typeOf for sequences and simplifyType #3772

Closed arv closed 6 years ago

arv commented 6 years ago

When we compute the type of a sequence we used to compute the type of every value in the sequence and then simplify the union. Now we check if the type of the value is the same as the type of the last element, in which case we do not need to include it in the union. This is the common case and it allows us top skip allocating a lot of Types and it makes the Type to simplify smaller which makes simplifying the type faster.

Also, handle the simple cases where the type is already simplified.

Optimize *Type Equals to reduce the number of times we have to encode it.

Introducing makeUnionType that removes the Union<T> earlier so that there is less work to do in simplifyType.

This halves the amount of time spent in TypeOf for csv import, going from 18 MB/s to 27 MB/s

Towards #3710, #3747