blaze / datashape

Language defining a data description protocol
BSD 2-Clause "Simplified" License
183 stars 65 forks source link

Hashing Performance #116

Closed mrocklin closed 9 years ago

mrocklin commented 9 years ago

The new Blaze computation pipeline stresses the expression system much more. The first performance issue to pop up is in datashape, notably hashing a datashape calls this property in Mono quite a bit

@property
def parameters(self):
    if hasattr(self, '__slots__'):
        return tuple(getattr(self, slot) for slot in self.__slots__)
    else:
        return self._parameters

Perhaps the hasattr and getattr bits are slow? This could be resolved either in datashape or in blaze. Caching the hash or parameters locally is a thought.

mrocklin commented 9 years ago

On closer inspection this cost is fairly distributed across a few sections. Caching hashes in blaze expressions is likely the way to go for now.