daisycrego / jbg-admin

Admin app for Jill Biggs Group follow-up boss event management
1 stars 0 forks source link

Improve loading speed of `Events` and `Leads` pages #32

Closed daisycrego closed 3 years ago

daisycrego commented 3 years ago

Improve speed of loading EventsTable and LeadsTable by performing paginated queries to the APIs

Refactor EventsTable and LeadsTable && events/leads APIs to send less data to the client in order to reduce all the latency loading these now massive datasets.

daisycrego commented 3 years ago

I still haven't finished the Leads page, but before I continue with that page or the LeadsTable, let's fix the underlying issues on the Events side, because it's going to be the same problem.

Latency Problem

In the jbgadmin MongoDB atlas namespace, the events collection currently has 6158 documents and growing. In the current design of the site, whenever a user goes to the home route, an API call is made by the client-side EventsTable component to retrieve the events documents from the backend. This retrieval process is becoming very expensive/time-consuming. We need to make paginated queries from the Events page to the API. Eventually we might have to lift this state up higher to handle some of the other EventsTable routing issues (https://github.com/daisycrego/jbg-admin/issues/28), but for now we can have the Events page handle fetching the required data and fetching data as needed (moving to a different page, changing the rowsPerPage, changing one of the status filters). Maybe we can start to put together a better representation of the EventsTable state to begin with, and then we can send all of the state to the backend so we get back only the page of data we need based on all our parameters. This does mean that each time we change the page, we're making a new API request, but at least we're not loading thousands of documents, I think that's worse...

Plan

  1. Refactor the /events API to accept POST parameters to specify which page of data to return, the number of rows per page, and what other filters are applied (activeStatuses, activeSources, date range).
  2. Refactor the Events page and EventsTable to make paginated/filtered queries from the new /events API (make POSTs with details for all the filters to set and the page being requested, activeStatuses, activeSources, and rowsPerPage).
daisycrego commented 3 years ago

Refactor /events API

daisycrego commented 3 years ago

Issues with refactor, speed not improving

daisycrego commented 3 years ago

I did a small investigation - it's not the sorting. It's the actual querying of all of that events data:

let events = await Event.find().select(
      "id updated created source property status processed processedAt eventId created isNewLead isPossibleZillowExemption isZillowEvent message person personId propertyId type"
    );

Optimizing MongoDB Compound Indexes - The "Equality - Sort - Range" (ESR) Rule

The ESR “Rule”

daisycrego commented 3 years ago

I created a search index for the mongodb cloud instance: Screen Shot 2021-07-05 at 7 04 05 PM

{
  "mappings": {
    "dynamic": true,
    "fields": {
      "created": [
        {
          "dynamic": true,
          "type": "document"
        },
        {
          "type": "date"
        }
      ],
      "source": [
        {
          "dynamic": true,
          "type": "document"
        },
        {
          "type": "string"
        }
      ]
    }
  }
}

You could also run createIndex somewhere to do this, something like:

db.jbgadmin.createIndex({created: 1, source: 1});

Pretty sure that helped. And then I changed the code to fetch from the db to:

let events = await Event.find({ source: activeSources })
      .sort({ created: -1 })
      .limit(pageSize);
daisycrego commented 3 years ago

There's more to be done, but I think I'm on the right track to speeding this up. Useful to know how to do this, though. So the issue right now is just that I need to make this line more reusable and replace the previous implementations:

let events = await Event.find({ source: activeSources })
    .sort({ created: -1 })
    .limit(pageSize);
daisycrego commented 3 years ago

What are all the fields that we need to care about with regards to Events documents? Which of those columns do we need to be able to sort on? Take a look at the EventsTable: Screen Shot 2021-07-05 at 7 15 58 PM We should be able to sort by created, either ascending or descending. Default is desc, but actually that's it! For now, let's just make that possible, it simplifies a lot of the code.

let events = await Event.find({ source: activeSources })
      .sort({ created: order === "desc" ? -1 : 1 })
      .limit(pageSize);
daisycrego commented 3 years ago

Ended up removing that limit(pageSize) because I actually do that later.

 let events = await Event.find({ source: activeSources }).sort({
      created: order === "desc" ? -1 : 1,
    });
daisycrego commented 3 years ago

This time my refactor actually helps to improve the loading speed for the EventsTable. There are still edge cases where the loading time is going to be long and this needs to either be directly addressed (possibly by loading ALL events data silently in the background to use once it's available, but use the subset of the data that arrives faster first, e.g. only the Zillow Flex events are going to arrive first at the EventsTable, which is perfect because this is the only default source, but at the same time we retrieve all events data for all sources/statuses, this way when we perform later sorting and filtering we're doing it with direct access to all the data... or we need some kind of a loading icon so the user knows their request is being processed (at the bare minimum).

daisycrego commented 3 years ago

Done

Events retrieves all of the data for the default search (returns a limited subset of the ~7000 rows, with source='Zillow Flex', which is relatively rare, and passes all of the data down to the EventsTable. The EventsTable uses internal state, activeRows and currentPageRows to keep track of the original rows prop sent down from Events and the slice of those rows for the current page. This way, when it comes time to change the sorting, we already have access to all of the data for the current query, we don't have to re-query. We only actually re-query from the database when the filters (source or status are changed). This means that sometimes there will still be long queries because a query that returns thousands of rows is just going to be slow on the server-side, not yet clear how to avoid that.

Todo