Open eerhardt opened 5 years ago
2730 implements AppendRow
on the DataFrame.
I'm not super convinced that we should have an Append
on BaseColumn
. This is mostly because DataFrameTable
holds a list of BaseColumn
and does book keeping of RowCount
. DataFrameTable
also enforces that each BaseColumn
has the same RowCount
. If BaseColumn
could append data to itself, nothing will break at the moment but it may not behave the way folks expect. Consider the following for ex:
BaseColumn intColumn = df["IntColumn"];
intColumn.Append(5);
df.AppendRow({100, 2.0}); // appends 100 to intColumn and 2.0 to floatColumn
When someone asks for the row containing [100, 2.0] by rowIndex, they might get [5, 2.0] instead now. An easy way to fix this would be to append nulls to the remaining columns whenever a BaseColumn.Append
is called. That's exactly what DataFrame.AppendRow
does, but the intent/behavior is more obvious with AppendRow
. So, do we still want an Append
on BaseColumn
?
Note: What I described above can happen already. PrimitiveColumn
and StringColumn
have an Append
on them. They exist so a user can initially populate a column with data or because they are being used without a DataFrame. Starting from a DataFrame though, a user would have to explicitly cast a BaseColumn
to one of those to call Append
. At that point, maybe we can assume that users won't be surprised by this behavior? Thoughts?
Another Note: Pandas avoid this by not appending in place. All appends to a column/Series return a new column/Series.
Note: What I described above can happen already. PrimitiveColumn and StringColumn have an Append on them. ... a user would have to explicitly cast a BaseColumn to one of those to call Append. At that point, maybe we can assume that users won't be surprised by this behavior?
I don't understand why casting from a BaseColumn
down to PrimitiveColumn
would change the expectation on behavior.
Another Note: Pandas avoid this by not appending in place. All appends to a column/Series return a new column/Series.
I think that behavior may make sense. As long as there is an AppendRange
type of method so I can append many values without creating intermediate column/Series objects.
We should have some APIs that allow a developer to easily append new values to DataFrame and to BaseColumn.
BaseColumn today has
Resize(Length + 1)
and thenSetValue(Length, value)
, but this isn't intuitive to use.DataFrame should have an InsertRow or AppendRow method that takes a collection of objects to insert.
cc @pgovind