Closed mountiny closed 3 months ago
Triggered auto assignment to @anmurali (Bug
), see https://stackoverflow.com/c/expensify/questions/14418 for more details. Please add this bug to a GH project, as outlined in the SO.
Note for myself, @marcaaron investigated and posted here that the slowest part of the command is actually the query, so looking into ways to reduce how much data the query returns will probably be the most productive.
Some other ways to speed up the command that were mentioned in that thread:
Took some time today breaking down the searchByKeywords query, trying to understand each piece, and theorizing a few ways to reduce what the query is doing. I also ran the chat report generator and generated 1000 reports in my local DB to use for testing.
Something interesting that I found was that this portion of the query contributes pretty significantly to the total query time, something like 30%.
(SELECT GROUP_CONCAT(displayName) FROM
(
SELECT JSON_EXTRACT(detailsNVP.value, '$.firstName') || JSON_EXTRACT(detailsNVP.value, '$.lastName') AS displayName FROM sharedReports s
INNER JOIN nameValuePairs detailsNVP ON detailsNVP.name = 'expensify_personalDetails' AND detailsNVP.accountID = s.accountID
WHERE s.reportID = r.reportID
UNION ALL
SELECT JSON_EXTRACT(detailsNVP.value, '$.firstName') || JSON_EXTRACT(detailsNVP.value, '$.lastName') AS displayName FROM nameValuePairs detailsNVP
WHERE detailsNVP.name = 'expensify_personalDetails'
AND detailsNVP.accountID = r.accountID
AND r.accountID != 0
)
) AS participantDisplayNames
I also think something like a getCriticalReportData()
that is used in some places instead of getChats()
might be helpful?
We can time how long the Report::getChats()
part of this command takes by using timestamps and logging the result. I was chatting with @neil-marcellini about this earlier today - but in the context of OpenApp
.
this portion of the query contributes pretty significantly to the total query time
Yes, some reports can have a lot of participants in sharedReports
so getting their details could be expensive.
looking into ways to reduce how much data the query returns
The query itself returns no data - only reportIDs
. Report::getChats()
is run after to get the data so that's what this ticket is about reducing (not trying to discourage you from optimizing the query though - anything we can learn there would be great).
Only returning the actual reports we are going to show the user
I'm not sure about this. Haven't observed it myself. If this is happening it means that the frontend is filtering out results that the backend assumes it should be showing. No idea where the difference is with that.
We could conceivably have the client pass the reportID at the “bottom” of the current search result to SearchForReports. The order of the reports would need to be stable between the client and server first (not certain if this is the case today). We should be able to re-design it to work this way.
This would be a larger project I think. It is basically suggesting we implement pagination for LHN searching.
Got it, I'll look into ways to optimize the getChats query specifically for SearchForReports next, thanks for your insight @marcaaron!
The query itself returns no data - only reportIDs. Report::getChats() is run after to get the data so that's what this ticket is about reducing (not trying to discourage you from optimizing the query though - anything we can learn there would be great).
Hmm based on some investigating the logs of some slow requests I feel like there are some cases where the query to search by keywords is way too slow.
At most, the query in getChats
will be passed 100 reportIDs, so I feel like it won't get that slow.
Here's a request where the searchForKeywords query took 103 seconds: https://www.expensify.com/_devportal/tools/logSearch/?#query=request_id:(%22890147f41fedbc5d-ZRH%22)+AND+timestamp:[2024-06-07T13:22:05.076Z+TO+2024-06-07T15:22:05.076Z]&index=logs_expensify-028822
Anyways I'll redirect my attention at reducing the data sent back first, seems that's the lower hanging fruit here, although the slow queries are definitely alarming.
Not overdue, still investigating a solution to this.
Have a draft PR up. Does some things that help to drastically reduce the number of lines returned in SearchForReports
for each report:
'lastReadSequenceNumber' => $lastReadSequenceNumber,
'lastVisibleActionLastModified' => $reportSummary['lastVisibleActionLastModified'] ?? $lastVisibleActionCreated,
'oldPolicyName' => (string) ($reportSummary['reportNameValuePairs'][Report::NVP_OLD_POLICY_NAME] ?? ''),
'visibility' => $visibility ?: null,
'hasOutstandingChildRequest' => $reportSummary['hasOutstandingChildRequest'] ?? false,
'hasParentAccess' => $reportSummary['hasParentAccess'] ?? null,
'writeCapability' => $reportSummary['reportNameValuePairs'][Report::NVP_WRITE_CAPABILITY] ?? ChatReport::CAPABILITY_ALL,
'description' => $description,
'isDeletedParentAction' => $reportSummary['isDeletedParentAction'] ?? null,
'fieldList' => count($reportFields) ? Policy_Utils::formatReportFieldsForOnyx($reportFields) : null,
'total' => $reportSummary['total'] ?? 0,
'unheldTotal' => $reportSummary['unheldTotal'] ?? 0,
'currency' => $reportSummary['currency'],
'nonReimbursableTotal' => $reportSummary['nonReimbursableTotal'] ?? 0,
'isCancelledIOU' => (bool) ($reportSummary['isCancelledIOU'] ?? false),
This cuts down the number of keys we need to send per report summary by over half.
^ After some more investigation I found that my approach was wrong.
The real issue with what we're doing in Onyx is we're treating all report types the same. There are many fields (like the "total" or "currency" fields, for example) that don't apply to chat reports at all, but we still send over a field like total: 0
, even though it has nothing to do with the report.
I'm going to go through each report type, find the minimal amount of data needed to display it as a search option, and ensure we only send over that data for that specific report type.
Posted some discussion here
After analyzing this response:
So my next steps are to:
Had some great discussion here and we've landed on yet another new approach:
This is not too far off to what I've done currently but I will need to make further modifications to my PRs, specifically modifying the getChats query. I've stuck both PRs that are under review back onto draft mode while I iron this out.
More progress with the PRs today! Should be close, just need another review
Just merged the Auth PR
Auth PR was deployed, this is done!
Problem
Coming from here
The
SearchForReports
command is returning lots of data that we probably do not need to show on the Finder page; the OpenReport command would load anything else we need. This is a big payload that is unnecessarily large for the Finder page.Coming from here we don't want chat rooms to show up in search if they have participant login/display names matching the search term. This has been called an "anti-feature" + it drastically slows down
SearchForReports
in a lot of cases when rooms with many participants are returned in the results.Solution
Let's make sure we send the reports in order to display them on the finder page, and with only as much data, we really need to display them. cc @quinthar
Let's also stop querying for and returning room participant data + personal details.