dotnet / interactive

.NET Interactive combines the power of .NET with many other languages to create notebooks, REPLs, and embedded coding experiences. Share code, explore data, write, and learn across your apps in ways you couldn't before.
MIT License
2.8k stars 374 forks source link

Enhance data sharing from SQL Kernels to C# #3491

Open IntegerMan opened 3 months ago

IntegerMan commented 3 months ago

Is your feature request related to a problem? Please describe. One of the selling points of Polyglot Notebooks is the ability to code in whatever language I want and share data between kernels when t makes sense to use another language for the next step of a processing task. A key example of this would be querying data from a SQL kernel and then sharing the data to a C# kernel.

The SQL query and storing results are straightforward:

#!sql-dad-jokes --name JokesTable
SELECT 
    j.Id, 
    j.Joke, 
    j.Punchline 
FROM 
    Jokes j

Right now the stored results of a SQL query are in an array of TabularDataResource objects registered to the SQL kernel. Given that resource, I want to share it with the .NET kernel, so I try a #!set with value and name to share the data to the C# kernel.

#!set --value @sql-dad-jokes:JokesTable --name JokesTable

Now C# has that result, but it's suddenly a JsonDocument with an unwiieldy schema and the data I want hiding deeper in the structure.

I can get this data using the following C# code:

using System.Text.Json;

JsonElement data = JokesTable.RootElement[0].GetProperty("data");

If I add a using statement I can then use the ToTabularDataResource extension method you offer:

using Microsoft.DotNet.Interactive.Formatting;
TabularDataResource res = data.ToTabularDataResource();

That's an hour or so of troubleshooting and a few lines of code to get back to a TabularDataResource, but wait, we're not done because what I actually wanted was a Microsoft.Data.Analysis.DataFrame to work effectively with the data.

The best way I've found to get this is to install the Extension lab NuGet package:

#r "nuget:Microsoft.DotNet.Interactive.ExtensionLab,1.0.0-beta.24164.1"

This then gives me my DataFrame through the ToDataFrame extension method:

using Microsoft.DotNet.Interactive.ExtensionLab;
using Microsoft.Data.Analysis;

DataFrame dfJokes = res.ToDataFrame();

That's a lot of work just to get a SQL query in one kernel into a DataFrame in another. Also, best I can tell, nobody has done this and documented it, so this issue stands as documentation that this is even possible.

Describe the solution you'd like Give me a better way of getting SQL data into the C# kernel. The JsonDocument format isn't enough. If you start with a TabularDataResource and that's a .NET type, then that's great; I'd like to keep that and work with it. This may be a different magic command or different syntax when executing #!set

Alternatively, give me some extension methods that make it easier to get a TabularDataResource back out of a JsonDocument.

I also really like the ToDataFrame extension method and hope it doesn't go away. I get nervous over the permanence of anything in a "Lab" proejct.

Describe alternatives you've considered See description. This is something I think people will want to do, but it didn't seem terribly documented that it was even possible anywhere. I'm including it in my book as a supported workflow, but I wish the experience was better on the developer.