EMS-TU-Ilmenau / fastmat

A library to build up lazily evaluated expressions of linear transforms for efficient scientific computing.
https://fastmat.readthedocs.io
Apache License 2.0
24 stars 8 forks source link

Hard coded transform batch size in norm calculation #86

Closed SebastianSemper closed 3 years ago

SebastianSemper commented 4 years ago

The size of the batches in the general norm calculation is currently set to a fixed value, which might not be desireable, if the vectors after before or after the transform are large. This might result in memory allocation errors.

# number of elements we consider at once during normalization
cdef intsize numStrideSize = 256
ChristophWWagner commented 4 years ago

I suggest the following to solve the problem:

We could take a peek onto the memory size of one vector and determine a better suited chunk size from that figure. This would result in better alignment of the chunks to cache sizes commonly found in modern processor architectures, resulting in more consistent performance.

A defensive approach for that would be to just assume that 1M of cache memory should be available to absolutely most machines around nowadays. Then we determine the stride size from that:

numStrideSize = max(1, 1048576 // np.empty((self.numCols, ), dtype=self.dtype).nybtes)

Since np.empty does not initialize the data section of that vector, the overhead should be neglectible, making it preferrable over hard-to-read explicit lowlevel-architecture bean-counting in this context.

For better adaptability, the magic number 1048576 should go into the flags object, such that this can be controlled by the user, perhaps even initialized from reading out the actual cache sizes.

Opinions? Objections?