dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.96k stars 4.65k forks source link

Performance delay when filtering a DataTable using the DefaultView.RowFilter property #106061

Closed SreemonPremkumarMuthukrishnan closed 2 weeks ago

SreemonPremkumarMuthukrishnan commented 1 month ago

Description

Description

When filtering the DataTable using the DefaultView.RowFilter property, it takes a long time (more than 2 minutes) to apply the filter. The number of items being filtered is 5,800. Please find the code snippet below.

Is there any way to improving the performance of filtering.

Note: The filter string was provided along with the sample.

C# code:

private void FilteringButton_Click(object sender, RoutedEventArgs e)
{
    DataTable dataTable = getDataReturn();
    var filterString = File.ReadAllText(@"..\..\Filterstring.txt");
    dataTable.DefaultView.RowFilter = filterString;
}

Output reference:

https://github.com/user-attachments/assets/d4e746c9-d665-4c53-b3c0-76f6167296f9

Sample:

DataTable_Filtering_Demo.zip

Reproduction Steps

  1. Run the sample
  2. Click the "Click to filter" button to filter

Expected behavior

The filter should not take too much time.

Actual behavior

The filtering takes too much time.

Regression?

No response

Known Workarounds

No response

Configuration

No response

Other information

No response

dotnet-policy-service[bot] commented 1 month ago

Tagging subscribers to this area: @roji, @ajcvickers See info in area-owners.md if you want to be subscribed.

SreemonPremkumarMuthukrishnan commented 1 month ago

Hi, Is there any way to improve or optimize the performance of filtering in a DataTable? Specifically, I am looking for methods or best practices that can enhance efficiency and reduce the processing time for large datasets.

roji commented 2 weeks ago

@SreemonPremkumarMuthukrishnan you are setting the DataTable's RowFilter to a filter that is 308k big; this is very much not the way that DataTable was meant to be used, and it's indeed expected for this to be very slow. Note that no real database will perform well with such a huge SQL query (and most will probably refuse to run it altogether).

In general, DataTable is an in-memory storage layer that isn't meant for use with huge datasets, and will certainly not work very efficiently (regardless of your specific huge RowFilter). I'd suggest using a real relational database which can properly handle such scenarios. For simplicity and for getting started, you may want to consider SQLite.

Minimal repro for future reference ```c# var dataTable = new DataTable(); dataTable.Columns.Add("LastName", typeof(string)); dataTable.Columns.Add("FirstName", typeof(string)); for (int i = 0; i < 40000; i++) { DataRow row = dataTable.NewRow(); row["LastName"] = i + RandomNameGenerator.GetRandomName(); row["FirstName"] = 1000 + i + RandomNameGenerator2.GetRandomName2(); dataTable.Rows.Add(row); } var filterString = File.ReadAllText("/tmp/Filterstring.txt"); Console.WriteLine("Setting RowFilter"); dataTable.DefaultView.RowFilter = filterString; Console.WriteLine("Done"); public class RandomNameGenerator { public static int NumberOfCar = 0; private static readonly string[] Names = { "dog", "cat", "car", "naver", "google", "outlook", "c#" }; private static readonly Random random = new Random(); public static string GetRandomName() { int index = random.Next(Names.Length); if (Names[index].Equals("car")) { NumberOfCar++; } return Names[index]; } } public class RandomNameGenerator2 { private static readonly string[] Names2 = { "새우깡", "감자깡", "양파깡", "꼬북칩", "비타민", "미니티슈", "크리넥스" }; private static readonly Random random2 = new Random(); public static string GetRandomName2() { int index = random2.Next(Names2.Length); return Names2[index]; } } ```