RandomFractals / chicago-crimes

Exploring Chicago crimes dataset with Jupyter notebooks, DuckDB, Malloy and new Panel/PyScript data and dashboard tools.
GNU Affero General Public License v3.0
38 stars 4 forks source link

Add Chicago crimes with pandas Jupyter notebook example #28

Closed RandomFractals closed 1 year ago

RandomFractals commented 1 year ago

to compare CSV data loading time with Polars (#7), PyArrow (#17) and DuckDB (#4).

Pandas read CSV docs: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

Also, add some plots from the old Chicago crimes Jupyter notebook:

https://github.com/RandomFractals/ChicagoCrimes/blob/master/notebooks/all-chicago-crime-charts.ipynb

RandomFractals commented 1 year ago

Loading Chicago crimes data with pandas is much slower when compared to pyarrows, duckdb, or polars and takes about 26 seconds:

https://github.com/RandomFractals/chicago-crimes/blob/main/notebooks/chicago-crimes-pandas.ipynb

chicago-crimes-with-pandas

RandomFractals commented 1 year ago

See new With Pandas and With Matplotlib sections in docs:

https://github.com/RandomFractals/chicago-crimes#with-pandas