Time series easier, faster, more fun. Pytimetk.
Please β us on GitHub (it takes 2-seconds and means a lot).
Time series analysis is fundamental in many fields, from business forecasting to scientific research. While the Python ecosystem offers tools like pandas
, they sometimes can be verbose and not optimized for all operations, especially for complex time-based aggregations and visualizations.
Enter pytimetk. Crafted with a blend of ease-of-use and computational efficiency, pytimetk
significantly simplifies the process of time series manipulation and visualization. By leveraging the polars
backend, you can experience speed improvements ranging from 3X to a whopping 3500X. Let's dive into a comparative analysis.
Features/Properties | pytimetk | pandas (+matplotlib) |
---|---|---|
Speed | π 3X to 3500X Faster | π’ Standard |
Code Simplicity | π Concise, readable syntax | π Often verbose |
plot_timeseries() |
π¨ 2 lines, no customization | π¨ 16 lines, customization needed |
summarize_by_time() |
π 2 lines, 13.4X faster | π 6 lines, 2 for-loops |
pad_by_time() |
β³ 2 lines, fills gaps in timeseries | β No equivalent |
anomalize() |
π 2 lines, detects and corrects anomalies | β No equivalent |
augment_timeseries_signature() |
π 1 line, all calendar features | π 29 lines of dt extractors |
augment_rolling() |
ποΈ 10X to 3500X faster | π’ Slow Rolling Operations |
As evident from the table, pytimetk is not just about speed; it also simplifies your codebase. For example, summarize_by_time()
, converts a 6-line, double for-loop routine in pandas
into a concise 2-line operation. And with the polars
engine, get results 13.4X faster than pandas
!
Similarly, plot_timeseries()
dramatically streamlines the plotting process, encapsulating what would typically require 16 lines of matplotlib
code into a mere 2-line command in pytimetk, without sacrificing customization or quality. And with plotly
and plotnine
engines, you can create interactive plots and beautiful static visualizations with just a few lines of code.
For calendar features, pytimetk offers augment_timeseries_signature()
which cuts down on over 30 lines of pandas
dt extractions. For rolling features, pytimetk offers augment_rolling()
, which is 10X to 3500X faster than pandas
. It also offers pad_by_time()
to fill gaps in your time series data, and anomalize()
to detect and correct anomalies in your time series data.
Join the revolution in time series analysis. Reduce your code complexity, increase your productivity, and harness the speed that pytimetk brings to your workflows.
Explore more at our pytimetk homepage.
Install the latest stable version of pytimetk
using pip
:
pip install pytimetk
Alternatively you can install the development version:
pip install git+https://github.com/business-science/pytimetk.git
This is a simple code to test the function summarize_by_time
:
import pytimetk as tk
import pandas as pd
df = tk.datasets.load_dataset('bike_sales_sample')
df['order_date'] = pd.to_datetime(df['order_date'])
df \
.groupby("category_2") \
.summarize_by_time(
date_column='order_date',
value_column= 'total_price',
freq = "MS",
agg_func = ['mean', 'sum'],
engine = "polars"
)
Get started with the pytimetk documentation
To install pytimetk
using Poetry, follow these steps:
Make sure you have Python 3.9 or later installed on your system.
To install Poetry, you can use the official installer provided by Poetry. Do not use pip.
Clone the pytimetk
repository from GitHub:
git clone https://github.com/business-science/pytimetk
Use Poetry to install the package and its dependencies:
poetry install
or you can create a virtualenv with poetry and install the dependencies
poetry shell
poetry install
We are in the early stages of development. But it's obvious the potential for pytimetk
now in Python. π