functime-org / functime

Time-series machine learning at scale. Built with Polars for embarrassingly parallel feature extraction and forecasts on panel data.
https://docs.functime.ai
Apache License 2.0
1.02k stars 55 forks source link

Pandas is a required dependency for plotting with plotly #72

Closed baggiponte closed 11 months ago

baggiponte commented 11 months ago

I did a fresh install of functime in a clean env and tried to run the following:

import polars as pl
from functime import plotting

def main():
    y = pl.scan_parquet(
        "https://github.com/descendant-ai/functime/raw/main/data/commodities.parquet"
    )

    plotting.plot_panel(y)

if __name__ == "__main__":
    main()

The following error is raised:

ImportError: Plotly express requires pandas to be installed.

Proposed solution

We should add pandas to the list of dependencies (duh). However, I would go a step forward and suggest to further break up the dependencies in optional groups. For example, one could install functime[plotting].

I think this could be a pretty big issue in terms of deployability for container size, etc. For example, in a production/inference setting, I believe one might not be interested in plotting capabilities.

A fresh functime install (including dependencies) is as much as 992MB in my venv. Pandas alone is 135MB (though numpy is a common dependency) which would make the whole "plotting" functions be as much as 293MB (just for plotly and pandas, though we would have to take out numpy that we might still be using under the hood for other things).

By comparison, a clean install of statsforecast (with dependencies) is just 637MB, statsforecasts + mlforecast is 680MB, while scikit-learn is 218MB (with dependencies) and polars alone is 93MB.

Let me know what you think :)

baggiponte commented 11 months ago

@topher-lo any thoughts about this?

topher-lo commented 11 months ago

Top priority