RandomFractals / chicago-crimes

Exploring Chicago crimes dataset with Jupyter notebooks, DuckDB, Malloy and new Panel/PyScript data and dashboard tools.
GNU Affero General Public License v3.0
39 stars 4 forks source link

Add .Net Interactive .ipynb CSV data loading example with C# #23

Open RandomFractals opened 2 years ago

RandomFractals commented 2 years ago

Use .Net Interactive Notebooks extension: https://marketplace.visualstudio.com/items?itemName=ms-dotnettools.dotnet-interactive-vscode

and Microsoft.Data.Analysis api: https://learn.microsoft.com/en-us/dotnet/api/microsoft.data.analysis.dataframe?view=ml-dotnet-preview

RandomFractals commented 2 years ago

Something is off while trying to load smaller 2022 crimes CSV data file with msft DataFrame:

chicago-crimes-dotnet-csv-read

RandomFractals commented 2 years ago

@colombod from .Net Interactive team suggested to try the latest preview version of .Net ML libs using:

#i "nuget:https://pkgs.dev.azure.com/dnceng/public/_packaging/MachineLearning/nuget/v3/index.json"
#r "nuget:Microsoft.Data.Analysis,0.20.0-preview.22514.1"

This is using a daily build that will be out soon for the Dataframe nuget.

Sample ml project notebook:

https://github.com/microsoft/dotnetconf-studentzone/blob/main/Using%20ML.NET%20for%20Machine%20Learning/WaterConsumptionMLproject.ipynb

RandomFractals commented 2 years ago

Updated .Net Interactive notebooks setup to use new Polyglot Notebooks ext.:

https://marketplace.visualstudio.com/items?itemName=ms-dotnettools.dotnet-interactive-vscode

Changed imports to ML .Net preview nugets listed above.

Still getting load CSV data error, even for the smaller 33Mb file:

crimes-dotnet-load-csv-error

RandomFractals commented 2 years ago

ML .net nuget is very beta and can't parse CSV with missing data fields yet.

Devs suggested to try 3rd party parquet library instead:

https://github.com/G-Research/ParquetSharp.DataFrame