RandomFractals / chicago-crimes

Exploring Chicago crimes dataset with Jupyter notebooks, DuckDB, Malloy and new Panel/PyScript data and dashboard tools.
GNU Affero General Public License v3.0
38 stars 4 forks source link

Create Chicago Crimes Julia CSVFiles with DataFrames script #11

Closed RandomFractals closed 1 year ago

RandomFractals commented 1 year ago

to compare data loading performance with native Julia csv and data frames packages to reading of CSV with DuckDB added in #10

Julia DataFrames doc: https://dataframes.juliadata.org/stable/

RandomFractals commented 1 year ago

This was the slowest raw CSV data loading run I've seen yet.

chicago-crimes-csv-julia-dataframe

In addition to that, Julia CSV data file reading and loading into dataframe consumed over 20GB of RAM while doing it vs the typical ~2gb or so I've seen with other duckdb read CSV tests.

julia-csv-files-dataframe-read-mem-usage

RandomFractals commented 1 year ago

see new section in docs: https://github.com/RandomFractals/chicago-crimes#with-julia-csvfiles-and-dataframe