I implmented a limited version of pandas.DataFrame.apply, which can allow axis=1 and elements of output dataframe are all types.float64, which can accelerate apply api by 5-10x in single core, by 20-30x in eight cores.
Actually, my implementation has some limitations (some from sdc, some from numba, some from my own):
args[-1] must be a tuple, and its length must be the same with the length of output dataframe columns. Besides, I want to use the value of args[-1] as the output dataframe column names, but I don't figure out a way to implement this idea, instead I have to generate a list named col_names to use in DataFrameType init
right now, my implementation unsupports kwargs used in origin pandas.DataFrame.apply
right now I assume axis=1, raw=False, result_type=None, which could be enhanced later
Related with limitation 1, I want to move DataFrameType init code into impl body for using the args[-1] value, but get_structure_maps cannot move into impl, since not jitted, any idea about it?
right now, I assume func returns Series type, which could be enhanced later too, e.g., allow list or np.ndarray type.
Last but not the least, I want users who use compiled apply can provide the each colunmn's type information of output dataframe, e.g., all types.float64, types.string, even mixed: the 1st column with types.float64, the other colunmns with types.string, but how to implement it? Any suggestion? Now, I only implement a version of all output types aretypesss.float64.
Any comments are welcome! Thanks! @kozlov-alexey @shssf
I implmented a limited version of
pandas.DataFrame.apply
, which can allowaxis=1
and elements of output dataframe are alltypes.float64
, which can accelerate apply api by5-10x
in single core, by20-30x
in eight cores.Actually, my implementation has some limitations (some from sdc, some from numba, some from my own):
args[-1]
must be atuple
, and itslength
must be the same with thelength
of output dataframe columns. Besides, I want to use the value ofargs[-1]
as the output dataframe column names, but I don't figure out a way to implement this idea, instead I have to generate a list namedcol_names
to use inDataFrameType
initkwargs
used in originpandas.DataFrame.apply
axis=1
,raw=False
,result_type=None
, which could be enhanced later1
, I want to moveDataFrameType
init code intoimpl
body for using theargs[-1]
value, butget_structure_maps
cannot move into impl, since not jitted, any idea about it?func
returnsSeries
type, which could be enhanced later too, e.g., allowlist
ornp.ndarray
type.apply
can provide the each colunmn'stype
information of output dataframe, e.g., alltypes.float64
,types.string
, even mixed: the 1st column withtypes.float64
, the other colunmns withtypes.string
, but how to implement it? Any suggestion? Now, I only implement a version of all output types aretypesss.float64
.Any comments are welcome! Thanks! @kozlov-alexey @shssf