Chunk search results to prevent OutOfMemory exception when large resultsets serialized

seanmobrien commented 1 year ago

Describe the bug When a search request that generates a large resultset is ran, an OutOfMemory exception is thrown when the search result is being serialized for storage in the search-results container. This behavior is more visible when multiple search requests are being processed at the same time, but with a large enough resultset the error occurs on a single search. OutOfMemory error can be duplicated with a single search request when Elastic plan is scaled to maximum allowed (E3)

To Reproduce Steps to reproduce the behavior:

Send a search request that generates a large resultset (example input for WMDA dev environment provided in Inputs section)
In matching-request topic, observe message is redelivered 10 times and ultimately dead-lettered
Using the Diagnose and Troubleshoot tool associated with the matching algorithm function, observe OutOfMemory exceptions have occurred.

Expected behaviour 1) Search results should be successfully uploaded to the storage container even when the resulset is large.
2) If a critical out-of-memory error occurs during serialization, an appropriate search failed message (with retry information) should be added to the search-results-ready topic.

Screenshots Snag_1ab3ff61

Snag_1ab42846

Inputs/Outputs

{"searchDonorType":"Adult","matchCriteria":{"donorMismatchCount":2,"locusMismatchCriteria":{"a":2,"b":2,"c":2,"dpb1":null,"dqb1":2,"drb1":2},"includeBetterMatches":true},"scoringCriteria":{"lociToScore":["Dpb1"],"lociToExcludeFromAggregateScore":[]},"searchHlaData":{"a":{"position1":"01:01","position2":"01:01"},"b":{"position1":"08:01","position2":"07:02"},"c":{"position1":"07:01","position2":"07:02"},"dpb1":{"position1":"02:01","position2":"87:01"},"dqb1":{"position1":"02:01","position2":"06:02"},"drb1":{"position1":"03:01","position2":"15:01"}},"patientEthnicityCode":null,"patientRegistryCode":null,"runMatchPrediction":true,"donorRegistryCodes":null}

Atlas Build & Runtime Info (please complete the following information):

Runtime Environment: Azure Cloud WMDA Dev Environment
Atlas Version: N/A
GitHub commit ID of Atlas build: 8b9ac32

Additional context This issue is reliably reproducable in WMDA's dev environment using the input above.

zabeen commented 1 year ago

HLD

Have decided to take two approaches to minimise memory usage when writing results:
1. Batch results into multiple files.
2. Write to individual result files using the AppendBlob method, to minimise the number of result objects that need to be serialised at once.
Result files will now be split into two types:
1. Search summary - this will be a single file containing all search result metadata, e.g., number of matches, the original search request, etc.
2. Search results - these will be written to 1 or more files, each containing a maximum number of results (this should be a configurable setting).
This change will impact both normal search and repeat search, and will introduce a breaking change in the search API.

TBC: Details about which components need to be amended

zabeen commented 1 year ago

@WMDAJesse @mmelchers do you have statistics from HAP-E about largest number of search results observed from a single search?

IgorKupreychik commented 1 year ago

We have decided to go with the 1st approach (Batch results into multiple files) first, once it's implemented we could check if it helps to avoid OutOfMemory exception and, if not - we will implement the 2nd approach (Write to individual result files using the AppendBlob method).

We propose the following file structure

Summary file: .json
Files with results: /1.json, /2.json, ..., /n.json

With this structure we won't have to send results file names within the service bus message, all the results can be read with just having the search_id (we will need to run 'ListBlobs' of the "folder" to get all the file names with results).

A new field 'ResultBatched', indicating if results are batched (i.e. split in to multiple files) or not, will be added to search results notification (for topics search-results-ready, matching-results-ready, repeat-search-results-ready)

WMDAJesse commented 1 year ago

@WMDAJesse @mmelchers do you have statistics from HAP-E about largest number of search results observed from a single search?

I think the largest one that we have seen so far was around 700.000 records, but that happens rarely I think. But we have definitely seen result set of 500.000+ records.

zabeen commented 1 year ago

@WMDAJesse thanks! Would it be possible to be granted the search request details (specifically, patient HLA and mismatch count) for the top 10 largest searches?

mmelchers commented 1 year ago

top 10 patients with most results 2023.csv

mmelchers commented 1 year ago

this is the top 10 based on 0 mismatch. All have at least 50 thousand results in case of 0 mismatch (when run in Hap-E prod). The amount of results will drastically go up when you increase the number of allowed mismatches.

seanmobrien commented 1 year ago

Thanks @mmelchers ! Do you know what number of allowed mismatches we are targeting for ATLAS? Also, is it possible to get some performance metrics - eg how quickly we need these large searches to complete in order to deliver a minimaly viable product?

mmelchers commented 1 year ago

@seanmobrien Donors (adult): 0, 1, and 2 mismatches There is a demand for 3 and 4 mismatch searches for donors, but this is not part of MVP. expected time to run: Hap-E has the following for searches with > 100 thousand potential 0 mismatch donors: 0 mismatch: median = 1016 seconds 1 mismatch: median = 1343 seconds 2 mismatch: median = 3788 seconds

So it would be reasonable for ATLAS to have the following: 0 mismatch: median < 2000 seconds 1 mismatch: median < 3000 seconds 2 mismatch: median < 7200 seconds

Cords: In case of n/8 or n/10: 0, 1, 2, 3, and 4 mismatches in case of n/6: 0, 1, and 2 mismatches all cord searches in Hap-E (even with 4 mismatches and more than 50 thousand records) finish within 2500 seconds. The median is < 700

So for ATLAS: any cord search with more than 50 thousand records: median < 1400 seconds

zabeen commented 1 year ago

@seanmobrien @mmelchers I'll copy the performance requirements to a new ticket

zabeen commented 1 year ago

Testing Notes

The following features will be impacted by search results being written as batch files or as a single file.
- Batching behaviour is dependent on the value of the release variable, RESULTS_BATCH_SIZE, which gets applied to app setting, SearchResultsBatchSize on both the matching and repeat search functions apps - value of >0 will cause results to be written as batched files, else only a single file.

Search Requests

Matching only - no match prediction
- Matching results can now either be written as batched files or a single file
Matching and match prediction
- Search orchestrator needs to be able to read both batched and unbatched matching results, and also write out the atlas search results as batched/unbatched files.

Repeat Search

Download of original search results
- Must now handle both batched and unbatched matching result files
Running a repeat search request
- Same impact as with running a normal search request, as described above.

zabeen commented 1 year ago

What is left on this ticket before it can be closed and moved to final review column:

@DmitriyShcherbina to add testing notes from AN dev

@seanmobrien to write up testing notes from WMDA dev

Anthony-Nolan / Atlas