ghiscoding / Angular-Slickgrid

Angular-Slickgrid is a wrapper of the lightning fast & customizable SlickGrid datagrid, it also includes multiple Styling Themes
https://ghiscoding.github.io/Angular-Slickgrid
Other
397 stars 120 forks source link

Slow Sorting on datetime Column with dataset around 50k Records in SlickGrid #1469

Closed shree20 closed 2 months ago

shree20 commented 2 months ago

Describe the bug

When sorting a column containing datetime values in a SlickGrid dataset around 50k records, the operation takes an excessively long time to complete. This performance issue severely impacts user experience when working with larger datasets, sometimes leading to screen freeze.

Reproduction

  1. Create a SlickGrid dataset with a datetime column.
  2. Populate the grid with over 50000 rows of data.
  3. Attempt to sort the datetime column by clicking the column header.

Below are my configs: { id: 'execution', name: 'Execution Timestamp', field: 'execution', filterable: true, sortable: true, formatter: Formatters.dateUtc, type: FieldType.dateUtc, filter: { model: Filters.compoundDate } }

Expectation

The sorting operation should complete in a reasonable time frame, similar to other data types without using pagination.

Environment Info

Angular - 14.1.1
Angular-Slickgrid - 5.6.4
Typescript - 5.4.5
Broswer - Chrome 128.0.6613.120
OS - Windows 11

Validations

ghiscoding commented 2 months ago

The Date Sort has already been optimized the best I could, you can take a look at the Slickgrid-Universal sortComparers/dateUtilities.ts and see for yourself if there's anything that could be improved. If I open Slickgrid-Universal Example 3 and click on the 500K button and then sort by date, it takes ~2sec which seems reasonable for 500k rows.

The problem on your side might be that your dates are maybe not pure UTC and that it might be taking extra time to parse them and convert to UTC dates. So you should make sure that the type is the closest to your input dates. The other problem is that you're using an old version of Angular-Slickgrid which I no longer support (I only support latest major) and on the last major, I switched from MomentJS to Tempo. MomentJS wasn't really the fastest, I'm pretty sure that Tempo is quicker

demo below is showing 500k date sort, the dates are in fact JS Date, so there's nothing to parse which might be why it's quicker.

msedge_c7GiQ8LkjR

Closing this issue, since like I said, it has already been optimized the best I could and it's on your responsibility to provide best possible date type for parsing that will improve perf. I'm pretty sure that in your case the time lost is in parsing from a string to a Date in MomentJS. Lastly, I just don't support older versions of the project.

Another side note, you could also use your own custom sortComparer function to sort a particular column. This could improve the speed if you know exactly the date format for parsing

ghiscoding commented 2 months ago

@shree20 I was able to replicate this by having dates that requires parsing, basically any dates that are string and not Date types. Most of the time spend is parsing dates from a string to a Date type but that is not the only time lost, the default browser .sort() method which is what we use to sort the data, will revisit the same item dataContext over and over. Because the same item is being revisited many times, it will try to parse the same items over and over which is extremely bad for performance...

So what can you do to make this faster with a more reasonable time, by reasonable time I mean that we can get the same perf as sorting a number column (which is currently a lot quicker than sorting a date column). I am working on something that will be doing just that.

You won't be getting any of these future perf if you're staying with an older version like you are. But you can also achieve this by looping through all item rows and parse every date column(s), then either overwrite the date with the parsed Date or keep the parse Date in a different column (like __start).

For example, let say that we have this dataset

data = [
  { id: 0, start: '02/28/24', finish: '03/02/24' },
  { id: 1, start: '01/14/24', finish: '02/13/24' },
]

You could loop through the dataset and parse the date strings and since you're using an old version of Angular-Slickgrid, you will need to use MomentJS to parse the dates. So you could do something like this, we'll parse the start and finish string dates and copy them into __start and __finish columns (or you could simply overwrite the properties directly)

this.columnDefinitions.forEach(col => {
  const fieldType = col.type || FieldType.string;
  const dateFormat = mapMomentDateFormatWithFieldType(fieldType);
  const strictParsing = dateFormat !== undefined;
  if (isColumnDateType(fieldType)) {
    const queryFieldName = `__${col.id}`;
    data.forEach((d: any) => {
      const date = moment(d[col.id], dateFormat, strictParsing);
      if (date) {
        d[queryFieldName] = date;
      }
    });
  }
});

So executing this will mutate the dataset and add these 2 extra properties, just a side note, below I wrote start: Date but I mean that Date is in fact a real Date object.

data = [
  { id: 0, start: '02/28/24', finish: '03/02/24', start: Date, finish: Date },
  { id: 1, start: '01/14/24', finish: '02/13/24', start: Date, finish: Date },
]

then make sure that you reference these new properties via the queryFieldSorter in your column definitions.

this.columnDefinitions = [
  { id: 'start', field: 'start', name: 'Start', queryFieldSorter: '__start' },
  { id: 'finish', field: 'finish', name: 'Finish', queryFieldSorter: '__finish' }
]

OR you could simply overwrite each data items (meaning parse the string dates and overwrite start item directly).

this.columnDefinitions.forEach(col => {
  const fieldType = col.type || FieldType.string;
  const dateFormat = mapMomentDateFormatWithFieldType(fieldType);
  const strictParsing = dateFormat !== undefined;
  if (isColumnDateType(fieldType)) {
    data.forEach((d: any) => {
      const date = moment(d[col.id], dateFormat, strictParsing);
      if (date) {
        d[col.id] = date;
      }
    });
  }
});

which will transform the array dataset

data = [
-  { id: 0, start: '02/28/24', finish: '03/02/24' },
-  { id: 1, start: '01/14/24', finish: '02/13/24' },
+  { id: 0, start: Date, finish: Date },
+  { id: 1, start: Date, finish: Date },
]

NOTE, you should do this parsing before assigning it to Angular-Slickgrid and also note that this will take some time (possibly couple of seconds) to parse the entire dataset. To improve performance, you could use Web Workers and do that in the background or simply wait for the parsing to be finished.

So what perf do we get out of this? With a dataset of 50k rows with 2 columns start/finish, we get these perf changes

# before
ASC sort: 15011.368896484375 ms
DESC sort: 4121.112060546875 ms

# after code change 
# 1. parse all dataset items
mutate: 1396.18994140625 ms

# 2. sort
ASC sort: 188.916015625 ms
DESC sort: 87.218994140625 ms

So we drop from 15sec to about 1.5sec which is quite a drop and 10x better perf

Why does this help? Mainly because the native browser .sort() will revisit the same items many times until they are all sorted (for example, when sorting 100 items, I saw it revisiting some items like 10 times, I could image this being a lot more when our dataset is 50k or even more) and with current Slickgrid implement, it will parse them over and over, parsing the same items multiple times, and that is especially bad for performance. However, if we parse each of these items only 1 time (with the code above), then it becomes a lot more performant. This becomes O(n2), however what we currently have is probably closer to an O(log n2).


Long story short, you can implement the code shown above to get quite a perf boost, but it's on your end to implement it (especially since you are on an unsupported version). I'll work on implementing something similar in Slickgrid-Universal in a future version