Closed JamesAlexander42 closed 4 years ago
Hello! Which API are you talking about? I still see this: https://github.com/dotnet/corefxlab/blob/f44a099c2a3bb8b0feedc92cdf9f66aba793a82c/src/Microsoft.Data.Analysis/DataFrame.cs#L72
I referred above the change I'm referring to
Ah I see. It still exists! We just moved that to the DataFrameColumnCollection
class. See this for an example: https://github.com/dotnet/corefxlab/blob/f44a099c2a3bb8b0feedc92cdf9f66aba793a82c/tests/Microsoft.Data.Analysis.Tests/DataFrameTests.cs#L648
I think @Hamaze was specifically asking why the API was changed that way. Maybe you can point to the API review?
I know that I'd rather write:
df["Int3"] = df["Int1"] * 2 + df["Int2"];
As opposed to:
df.Columns["Int3"] = df.Columns["Int1"] * 2 + df.Columns["Int2"];
Yeah, the former is more pandas-esque and comfortable IMO.
Agreed.
I suspect the reason is for the row filter to work.
But you are more often interested in column selection than row selection so it's better to have the penalty there instead.
df.Rows[2 .. 12]
df["Int3"] = df["Int1"] * 2 + df["Int2"];
Just out of curiosity, are you using DataFrame in a notebook? Reason I ask is that we've worked on a cool extension for DataFrame in notebooks that'll let you write df.Int1 * 2 + df.Int2
. To be specific, with the new extension you can now refer to a column as a field of a DataFrame
object. With intellisense enabled in notebooks, this will be very discoverable.
I'm not using it in a notebook context for this exercise. Using it in an asp.net app.
Haven't tested the new notebook support yet but will do.
I would say that even though notebooks are very useful, I much prefer the experience to be the same when doing normal software and when doing notebooks.
Usually I prototype in notebooks and then structure and copy stuff to some kind of software that is more production-like. So I would avoid using any extensions in notebook except for ones that are interactive such as plotting.
I concur with @MikaelUmaN.
For instance, I would expect code that runs in F# kernel notebook can be run in FSI under Visual Studio directly. I would also expect it to be compiled to be part of a bigger production system mixed with C# and F#. That's how I explore my problems in Ifsharp notebook and put them in production all the time.
However, if syntax involving dataframe relies on an extra notebook extension that only works in notebook, the beauty of production-ready scripts is no longer feasible.
cc @cartermp @dsyme to chime in.
Just tagging @eerhardt for visibility here. This is great feedback! We're busy helping out with .NET 5 stuff this week, but I'll revisit this next week. There's enough support here to consider bringing back the column name indexer on DataFrame.
There's enough support here to consider bringing back the column name indexer on DataFrame.
I agree. Personally I like the ease of use of df["Int1"]
as well, so I'm glad I'm not alone.
It should be pretty easy to add the API back as a wrapper over the .Columns[string]
indexer, and a test or two. Anyone want to make a PR for that?
Just out of curiosity, are you using DataFrame in a notebook? Reason I ask is that we've worked on a cool extension for DataFrame in notebooks that'll let you write df.Int1 * 2 + df.Int2. To be specific, with the new extension you can now refer to a column as a field of a DataFrame object. With intellisense enabled in notebooks, this will be very discoverable.
However, if syntax involving dataframe relies on an extra notebook extension that only works in notebook, the beauty of production-ready scripts is no longer feasible.
Yes, we need to be very careful about promoting non-standard extensions to the programming model for C# or F# which are only deployed only through select channels. Notebook programming should ideally not be using variations of these programming languages, though these things are subtle
This is a tricky area because there is a notable tendency to use the incremantal-dynamicity of notebook programming
@pgovind What APIs are you using to craft this language variation? Please discuss this with @MadsTorgersen, @jaredpar and myself. We can't have random variations on C# and F# floating around that fragment the overall programming experience.
So, just to be clear, the extension I'm talking about here is only a prototype to explore the dotnet-interactive extensions APIs. There's no immediate plans to productize it right now, and we definitely don't want to create fragmentation. It lives here: https://github.com/dotnet/interactive/blob/main/src/Microsoft.DotNet.Interactive.ExtensionLab/DataFrameTypeGeneratorExtension.cs
What APIs are you using to craft this language variation? It's not really a variation. It's a prototype right now (and not part of the type itself). Basically, given a
DataFrame
object, it looks at the types of the columns and spits out code to create a newSomeNameDataFrame
type with the column names as properties. This code is then compiled on demand and the dotnet-interactive shell then exposes this type for use in the notebook.
@pgovind The problem is that this sort of "generating API from dynamic data" is a completely new thing in the .NET universe (the closest thing is F# type providers, and then source generators, though those are normally part of the static toolchain).
It doesn't really fit any existing part of the existing C#/F#/.NET programming model and can never really be incorporated into project-based programming, for example. It can only be done in notebook-like environments that assume a complete compiler toolchain at each stage of execution, even in production scripts.
It's a powerful thing to be sure but we have to be aware of the direction this is going. I understand why you're thinking of doing this but yes, fragmentation of the programming experience is an intrinsic part of this direction, as tempting as it is.
An approach that does fit within existing norms is to drive the code generation off some kind of static schema (declared or acquired).
Why was indexing into a DataFrame removed in latest? Looking at the commit history too I can see the block was deleted. This makes very an awkward use experience now.