Closed daisycrego closed 3 years ago
I still haven't finished the Leads
page, but before I continue with that page or the LeadsTable
, let's fix the underlying issues on the Events
side, because it's going to be the same problem.
In the jbgadmin
MongoDB atlas namespace, the events
collection currently has 6158 documents and growing. In the current design of the site, whenever a user goes to the home route, an API call is made by the client-side EventsTable
component to retrieve the events documents from the backend. This retrieval process is becoming very expensive/time-consuming. We need to make paginated queries from the Events
page to the API. Eventually we might have to lift this state up higher to handle some of the other EventsTable
routing issues (https://github.com/daisycrego/jbg-admin/issues/28), but for now we can have the Events
page handle fetching the required data and fetching data as needed (moving to a different page, changing the rowsPerPage
, changing one of the status filters). Maybe we can start to put together a better representation of the EventsTable
state to begin with, and then we can send all of the state to the backend so we get back only the page of data we need based on all our parameters. This does mean that each time we change the page, we're making a new API request, but at least we're not loading thousands of documents, I think that's worse...
/events
API to accept POST parameters to specify which page of data to return, the number of rows per page, and what other filters are applied (activeStatuses
, activeSources
, date range
).Events
page and EventsTable
to make paginated/filtered queries from the new /events
API (make POSTs with details for all the filters to set and the page being requested, activeStatuses
, activeSources
, and rowsPerPage
). /events
APIEvents
page is making a request the backend using the list
event api method in a useEffect
hook which runs only once when the component is initially rendered.list
api method takes in an abort signal so that if the Events
component is re-rendered, the ongoing request to the api is aborted. This prevents an explosion of concurrent requests from going out if for some reason there's a lot of re-rerenders.Events
has an events
state which it sets in the useEffect
hook and then passes down to the EventsTable
as its rows
prop.Events
page, but for now, we can keep this in the Events
page and actually add some more handlers to handle other requests to the new API:
handleUpdatePage
--> updates page state and makes a request with all of the current filters to the new /events
APIhandleUpdateRowsPerPage
handleUpdateActiveSources
handleUpdateActiveStatuses
EventsTable
to the Events
page, because we've decided to make our fetches to the new API from the Events
page for now, later on from higher up. The EventsTable
will need to change in that now it will always be finding out from higher up what rows to display. We need to remove currentPageRows
and activeRows
from the EventsTable
and move the single concept, currentPageRows
, up to the Events
page. activeRows
will be removed completely. We will move sources
, activeSources
, statuses
, and activeStatuses
up from EventsTable
to Events
, keeping in mind that we will need to refactor how we retrieve statuses
and sources
(previously we had access to all the documents so we calculated statuses
and sources
from that using lodash operations, but now we're only ever pulling in a subset of the data, so we need an extra API method to access all statuses and sources, rather than just the ones in the current page being returned from the new /events
API. We also need to move page
, rowsPerPage
, order
, and orderBy
up to the Events
page because all of those will be used to make a fetch to the new /events
API in order to get only the page of data actually needed.Events
component takes more of the state of the EventsTable
, and performs a paginated query to the events API, with the hope being that this would speed up loading. It didn't. There's something in the retrieval process itself that's slow, regardless of how much data we return. And I guess this makes sense, in retrospect. The speed issues have to do with some slow part of the api query to the mongodb cloud instance, or the sorting (this is probably it??) of all the data before we actually slice out the part for this page of data). I did a small investigation - it's not the sorting. It's the actual querying of all of that events data:
let events = await Event.find().select(
"id updated created source property status processed processedAt eventId created isNewLead isPossibleZillowExemption isZillowEvent message person personId propertyId type"
);
let events = await Event.find().limit(pageSize);
Zillow Flex
, I won't get 10 Zillow Flex leads by default. I get 10 leads. If I'm lucky, some of them are Zillow Flex. Currently, I only end up getting 1 row back now. I really need both more control over the query and faster querying. find({ x: 123 })
find({ x: { $eq: 123 } })
aggregate([ { $match:{ "x.y": 123 } } ])
(2) Sort predicates follow Equality predicates Sort predicates represent the entire requested sort for the operation and determine the ordering of results. For example:
find().sort({ a: 1 })
find().sort({ b: -1, a: 1 })
aggregate([ { $sort: { b: 1 } } ])
(3) Range predicates follow Equality and Sort predicates. Range predicates are filters that may scan multiple keys as they are not testing for an exact match. For example:
find({ z: { $gte: 5} })
find({ z: { $lt: 10 } })
find({ z: { $ne: null } })
I created a search index for the mongodb cloud instance:
{
"mappings": {
"dynamic": true,
"fields": {
"created": [
{
"dynamic": true,
"type": "document"
},
{
"type": "date"
}
],
"source": [
{
"dynamic": true,
"type": "document"
},
{
"type": "string"
}
]
}
}
}
You could also run createIndex
somewhere to do this, something like:
db.jbgadmin.createIndex({created: 1, source: 1});
Pretty sure that helped. And then I changed the code to fetch from the db to:
let events = await Event.find({ source: activeSources })
.sort({ created: -1 })
.limit(pageSize);
There's more to be done, but I think I'm on the right track to speeding this up. Useful to know how to do this, though. So the issue right now is just that I need to make this line more reusable and replace the previous implementations:
let events = await Event.find({ source: activeSources })
.sort({ created: -1 })
.limit(pageSize);
What are all the fields that we need to care about with regards to Events
documents? Which of those columns do we need to be able to sort on? Take a look at the EventsTable
:
We should be able to sort by created
, either ascending or descending. Default is desc
, but actually that's it! For now, let's just make that possible, it simplifies a lot of the code.
let events = await Event.find({ source: activeSources })
.sort({ created: order === "desc" ? -1 : 1 })
.limit(pageSize);
Ended up removing that limit(pageSize)
because I actually do that later.
let events = await Event.find({ source: activeSources }).sort({
created: order === "desc" ? -1 : 1,
});
This time my refactor actually helps to improve the loading speed for the EventsTable
. There are still edge cases where the loading time is going to be long and this needs to either be directly addressed (possibly by loading ALL events data silently in the background to use once it's available, but use the subset of the data that arrives faster first, e.g. only the Zillow Flex events are going to arrive first at the EventsTable
, which is perfect because this is the only default source
, but at the same time we retrieve all events data for all sources/statuses, this way when we perform later sorting and filtering we're doing it with direct access to all the data... or we need some kind of a loading icon so the user knows their request is being processed (at the bare minimum).
Events
retrieves all of the data for the default search (returns a limited subset of the ~7000 rows, with source='Zillow Flex'
, which is relatively rare, and passes all of the data down to the EventsTable
. The EventsTable
uses internal state, activeRows
and currentPageRows
to keep track of the original rows
prop sent down from Events
and the slice of those rows for the current page. This way, when it comes time to change the sorting, we already have access to all of the data for the current query, we don't have to re-query. We only actually re-query from the database when the filters (source
or status
are changed). This means that sometimes there will still be long queries because a query that returns thousands of rows is just going to be slow on the server-side, not yet clear how to avoid that.
EventsTable
search is updating (so they don't refresh the page, especially for the long queries).EventsTable
once it's available, but on the initial load only retrieve the first query's worth of data. The idea being that the user will look at the first query and hopefully the full dataset is available to the EventsTable
object when they update their filters, as this will lead to much faster updates, as EventsTable
already has all of the data. Requires a bit more coordination...
Improve speed of loading
EventsTable
andLeadsTable
by performing paginated queries to the APIsRefactor
EventsTable
andLeadsTable
&& events/leads APIs to send less data to the client in order to reduce all the latency loading these now massive datasets.