dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.46k stars 4.76k forks source link

Add the method to add column-data in DataTable without filling row-by-row #31467

Closed rstm-sf closed 4 years ago

rstm-sf commented 5 years ago

Hello!

If I know correctly, the data in the DataTable is stored by columns. But I can't find a method to add column-data in DataTable without filling row-by-row. In applications that operate on columns, it would look more effective.

roji commented 5 years ago

Thanks for your interest in DataTable. As far as I know (and I'm not an expert), DataTable doesn't use columnar storage under the hood - the basic data structure it holds is rows, which it holds in a red-black tree. Can you sure share where you've seen that DataTable is stored by columns?

In any case, at this point we're not looking to evolve DataTable/DataSet with new features, so there's little chance we'd look at a major change.

rstm-sf commented 5 years ago

Initially, the idea arose when I read one of the articles on the Internet :) It claimed that the data is stored in columns to remember the type for data-in-column in the one.

Then, I decided to make sure of this by looking at the source code. But for me it seemed difficult to parse and I settled on the following guesses

https://github.com/dotnet/corefx/blob/e99ec129cfd594d53f4390bf97d1d736cff6f860/src/System.Data.Common/src/System/Data/Common/DataStorage.cs#L283-L348

https://github.com/dotnet/corefx/blob/e99ec129cfd594d53f4390bf97d1d736cff6f860/src/System.Data.Common/src/System/Data/DataRow.cs#L177-L195

rstm-sf commented 5 years ago

As far as I know (and I'm not an expert), DataTable doesn't use columnar storage under the hood - the basic data structure it holds is rows, which it holds in a red-black tree.

Ok, I’ll try to look in more detail.

roji commented 5 years ago

Before you spend too much time on it, it's very unlikely we'd actually introduce a new API on DataTable for this kind of thing. This wouldn't be trivial to do (e.g. need to take care of other columns of newly added rows via the API), and DataTable has various other design issues that limit its usefulness in a high-perf scenario (e.g. it boxes). Finally, in general there hasn't been any active development for quite a while on DataTable or related types.

rstm-sf commented 5 years ago

This wouldn't be trivial to do (e.g. need to take care of other columns of newly added rows via the API)

Oh, it helped me simplify the task of understanding that everything is stored in columns.

See

https://github.com/dotnet/corefx/blob/e99ec129cfd594d53f4390bf97d1d736cff6f860/src/System.Data.Common/src/System/Data/DataColumnCollection.cs#L219-L229

https://github.com/dotnet/corefx/blob/e99ec129cfd594d53f4390bf97d1d736cff6f860/src/System.Data.Common/src/System/Data/DataColumnCollection.cs#L130-L133

https://github.com/dotnet/corefx/blob/e99ec129cfd594d53f4390bf97d1d736cff6f860/src/System.Data.Common/src/System/Data/DataColumnCollection.cs#L135-L150

https://github.com/dotnet/corefx/blob/e99ec129cfd594d53f4390bf97d1d736cff6f860/src/System.Data.Common/src/System/Data/DataColumnCollection.cs#L309-L316

to in private void BaseAdd(DataColumn column) there is a scenario in which

https://github.com/dotnet/corefx/blob/e99ec129cfd594d53f4390bf97d1d736cff6f860/src/System.Data.Common/src/System/Data/DataColumnCollection.cs#L348-L358

for (int record = 0; record < _table.RecordCapacity; record++)

https://github.com/dotnet/corefx/blob/e99ec129cfd594d53f4390bf97d1d736cff6f860/src/System.Data.Common/src/System/Data/DataColumn.cs#L1170-L1174

https://github.com/dotnet/corefx/blob/e99ec129cfd594d53f4390bf97d1d736cff6f860/src/System.Data.Common/src/System/Data/Common/DataStorage.cs#L251-L252

and for example

https://github.com/dotnet/corefx/blob/e99ec129cfd594d53f4390bf97d1d736cff6f860/src/System.Data.Common/src/System/Data/Common/BigIntegerStorage.cs#L11-L13

https://github.com/dotnet/corefx/blob/e99ec129cfd594d53f4390bf97d1d736cff6f860/src/System.Data.Common/src/System/Data/Common/BigIntegerStorage.cs#L127-L140

rstm-sf commented 5 years ago

Then in that case

DataTable has various other design issues that limit its usefulness in a high-perf scenario (e.g. it boxes)

It is based on a column-storage collection

rstm-sf commented 5 years ago

In general, yes, this will not be a trivial task, and few people need it (though just why?) ;)

ajcvickers commented 4 years ago

Closing as this is not something we plan to implement.