dotnet / machinelearning

ML.NET is an open source and cross-platform machine learning framework for .NET.
https://dot.net/ml
MIT License
9.01k stars 1.88k forks source link

Best way to convert Datatable to Dataframe #7036

Closed chuongmep closed 6 months ago

chuongmep commented 7 months ago

Is your feature request related to a problem? Please describe. Hi, Do we have any way to fast convert from System.Data.Datatable to Microsoft.Data.Analysis.DataFrame ?

I tried with my solution but it still too slow

Describe the solution you'd like

public static Microsoft.Data.Analysis.DataFrame ToDataFrame(this DataTable dataTable)
    {
        Microsoft.Data.Analysis.DataFrame dataFrame = new Microsoft.Data.Analysis.DataFrame();

        foreach (DataColumn column in dataTable.Columns)
        {
            // get values from column cast as string
            string[] values = dataTable.AsEnumerable().Select(r => r.Field<object>(column.ColumnName)?.ToString()).ToArray();
            DataFrameColumn dataFrameColumn = DataFrameColumn.Create(column.ColumnName, values);
            dataFrame.Columns.Add(dataFrameColumn);
        }
        return dataFrame;
    }

Describe alternatives you've considered

Additional context Add any other context or screenshots about the feature request here.

asmirnov82 commented 7 months ago

Hi @chuongmep, if you are reading your data from the database you can use DataFrame.LoadFrom(DbDataAdapter adapter) method, in other case you may try to use DataFrame.LoadFrom(IEnumerable<IList<object>> vals, IList<(string, Type)> columnInfos). These methods are also quite slow, because they add data row by row, but there is a chance that it is still faster as there isn't any values conversion to strings

chuongmep commented 6 months ago

Thank you for your help !