Closed asmirnov82 closed 9 months ago
It's also possible to get rid of cloning of the left side and create new empty column for results instead. However investigation shows, that there isn't any dramatical improvement of performance on avoiding Cloning. On the other hands, it requires quite a lot of code changes in both PrimitiveDataFrameColumn.BinaryOperations.tt and PrimitiveDataFrameColumn.BinaryOperationImplementations.Exploded.tt (current implementation of DataFrame provides two different implementations for arithmetic calculation: one for PrimitiveDataFrameColumn
Here is the result of my experimentation (first column is speed with just enhanced nullable, second column is enhanced nullable + avoiding cloning):
Final results, when PR is implemented
During arithmetic operations dataframe performs cloning the left side column into the result to have validity bitmap and than checks the right side validity bitmap for NULL value.
For example for Multiply we do cloning in case of inPlace parameter is set to false (default behavior):
and inside container for each value we check validity:
Validity check is a very slow operation. It's possible to calculate Raw values and then use binary logic (AND) for calculating validity bitmap for whole byte.