Performance improvements for downloading match prediction results in PersistSearchResults

Anthony-Nolan / Atlas

A free & open-source Donor Search Algorithm Service

GNU General Public License v3.0

9 stars 5 forks source link

Performance improvements for downloading match prediction results in PersistSearchResults #938

Closed daria-sorokina-da closed 1 year ago

daria-sorokina-da commented 1 year ago

Currently, PersistSearchResults takes quite a long time, because it loads match prediction results sequentially, see: https://github.com/Anthony-Nolan/Atlas/blob/master/Atlas.Common/AzureStorage/Blob/BlobDownloader.cs#L93

foreach (var location in locations)
{
    data[location.Key] = await GetBlobData<T>(containerClient, location.Value);
}

In the search with 10000 donors, there will be 10000 sequential GetBlobData calls.

Let's investigate if: 1) variant 1 - create a new activity for saving individual blobs within atlas-search-results 2) variant 2 - if we can parallelize calls for GetBlobData in multuple threads https://github.com/Anthony-Nolan/Atlas/blob/master/Atlas.Common/AzureStorage/Blob/BlobDownloader.cs#L93

daria-sorokina-da commented 1 year ago

Performance measures: Search 22015ec9-9fc7-4242-bd35-69626159f00b performed 03 Apr 2023 Search step - 17 min Match Predictions step - 1 hour Persist Search Results step - 1 hour

daria-sorokina-da commented 1 year ago

@DmitriyShcherbina could you please prepare a search request for AN Dev env that would have some larger donors count (at least 1000)?

DmitriyShcherbina commented 1 year ago

@DmitriyShcherbina could you please prepare a search request for AN Dev env that would have some larger donors count (at least 1000)?

Here is Search Request, it should return 4963 donors

{ "SearchDonorType": 1, "MatchCriteria": { "DonorMismatchCount": 4, "LocusMismatchCriteria": { "A": 2, "B": 1, "C": 1, "Dpb1": null, "Dqb1": 0, "Drb1": 0 }, "includeBetterMatches": true }, "ScoringCriteria": { "LociToScore": [], "LociToExcludeFromAggregateScore": [] }, "PatientEthnicityCode": null, "PatientRegistryCode": null, "runMatchPrediction": false, "SearchHlaData":{ "A": { "Position1": "*01:DXGWF", "Position2": "*01:DXGWF" }, "B": { "Position1": "*07:DXFTK", "Position2": "*07:DXFTK" }, "C": { "Position1": "*05:DUVRN", "Position2": "*07:BRXNC" }, "DPB1": { "Position1": "*03:FYKD", "Position2": "*04:BYVXE" }, "DQB1": { "Position1": "*03:BMSUA", "Position2": "*06:BSBZX" }, "DRB1": { "Position1": "*15:01:01", "Position2": "*12:JV" } } }