MLBazaar / MLBlocks

A library for composing end-to-end tunable machine learning pipelines.
https://mlbazaar.github.io/MLBlocks
MIT License
114 stars 35 forks source link

Get execution time for each block #127

Closed sarahmish closed 4 years ago

sarahmish commented 4 years ago

Record the time it takes to fit each primitive. This feature will become handy in debugging pipelines and understanding where the overhead is.

# mlpipeline
def get_time(self):
"""Get the execution time of each block.

If called before fitting the pipeline, it will return an empty dictionary.

Returns:
    dict:
        A dictionary containing the block names as keys and
        the execution time in seconds as values.
"""
csala commented 4 years ago

Hi @sarahmish

Capturing debug information sounds good, but I would set this as an optional feature rather than the default behavior.

More precisely, when it comes to debugging I think that it would make sense to enable a debug mode (fit(..., debug=True) and predict(..., debug=True) for the pipeline which, if enabled, makes the pipeline return a dict with information about what happened in each step, including the elapsed time but also input and output arguments. This would also allow us to later on add other profiling information, such as CPU time vs IO time information or memory consumption.