fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
924 stars 196 forks source link

`frame.ColumnTypes` incorrectly changes to System.Object when filtering frame to rows where the column's values are null #516

Open ppatino opened 3 years ago

ppatino commented 3 years ago

Issue description

I am running into an issue where a Frame is "losing" its column's initial data types when the Frame is filtered to only contain rows where the column's values are missing. Pardon the C#-isms in advance, I am using Deedle from C# code, but hopefully this is all clear.

Steps to reproduce the issue

  1. Start with a Frame where one of the columns can be null. In this example, we start with a frame where the columns are of type string ("Name" field) and int? ("Age" field).
  2. Inspecting the frame.ColumnTypes directly after the frame is created below using Frame.FromRecords results in the expected types of string and int? being output.
  3. Create a new frame by filtering out rows where the nullable column has values, i.e. filter to rows where no row has values for a given column. In this simple case where we have 2 Person records, I filter to index 0 aka the "Alice" record where Age is null.
  4. Inspecting the filtered.ColumnTypes produces an unexpected result of the "Age" column having a type of System.Object.
public class Person
{
    public int? Age;
    public string Name;
}
Person[] records = new Person[]
{
    new Person() { Name="Alice", Age = null},
    new Person() { Name="Bob", Age = 45}
};

Frame<int, string> frame = Frame.FromRecords(records);
//Output of `frame.ColumnTypes` is the expected `string`, `int?`

Frame<int, string> filtered = frame.Where(c => c.Key == 0);
Frame<int, string> filtered =  Frame.FromRows(frame.Rows.Where(c => c.Key == 0));
//After filtering (done in the 2 different ways I am aware of for filtering rows), the `filtered.ColumnTypes` property returns types
//`string` and `Object` when that 2nd type should still be `int?`

What's the expected result?

What's the actual result?

ppatino commented 3 years ago

I realized I can also do row filtering by using:

frame.RealignRows(frame.RowKeys.Where(rk => rk == 0));

in which case the rows are appropriately filtered (in this case simply to index 0) AND the frame.ColumnTypes appears to be correct after this (i.e. types are string, int?).