Validate match prediction using the published consensus dataset

zabeen commented 2 years ago

External partners including NMDP and WMDA have requested that this published consensus dataset (specifically, MVS3) be run through Atlas to ensure that its match predictions are in line with expectations.

Some changes are required to the codebase to allow efficient processing of the 10 million patient-donor-pair (PDP) dataset.

zabeen commented 2 years ago

HLD

After discussion with @benbelow and @luken-an, we have decided to add a new service-bus function to the Match Prediction project that will bulk process messages that each contain a single PDP for which match probabilities will be calculated; results will be stored to a sub-folder of the match-prediction-results container.

Atlas.MatchPrediction.Functions

New http-triggered function that accepts single PDP request and returns a unique request ID (i.e., search request pattern).
- need patient & donor IDs, patient & donor metadata, patient & donor HLA
- returns unique guid
- sends service bus message to new service bus topic, match-prediction-requests.
New function that bulk downloads service bus messages from match-prediction-requests, and iterates over each request (i.e., donor update pattern)
- can either use the existing custom code that is currently used by matching algorithm donor update workflow, or try the new officially supported batch trigger.
- either way, need to ensure that:
  - if a batch fails for a transient reason, the batch is retried
  - if a single message in a batch fails import for an expected reason (e.g. validation errors), the remainder of the batch is still processed successfully.
- Function should persist each match prediction result to a new subfolder of the existing match-prediction-results container.
- subfolder name should be suitably named to stress that these results were generated outside of the normal search process, e.g., match-prediction-only or match-prediction-without-search.
- blob file name should be <match-prediction-request-id>.json.

Manual Testing

Create new projects in solution folder Manual Testing to run the validation exercise (i.e., following match prediction verification pattern).
Project name prefix: Atlas.MatchPrediction.Test.Validation
New functions app project to process PDP dataset, send match prediction requests, and process result blobs
New data project to store dataset, track requests, and store processed results
New unit test project

zabeen commented 2 years ago

Testing of Match Prediction Function changes

i.e., ability to run a match prediction request outside of search.

Notes

Tested by deploying feature branch to verify environment (build 20220923.1)
submitted null for patient and donor metadata to force use of the global HF set that already exists on that env
- global HF set HLA version is 3330

Feature testing ✅

Happy path ✅

HLA run for all requests, both patient and donor:

    "A": {
      "Position1":"*02:XX",
      "Position2": "*02:XX"
    },
    "B": {
      "Position1": "*40:XX",
      "Position2": "*40:XX"
    },
    "C": {
      "Position1":  "*03:XX",
      "Position2": "*03:XX"
    },
    "Dpb1": {
      "Position1": "*02:XX",
      "Position2": "*01:XX"
    },
    "Dqb1": {
      "Position1": "*03:XX",
      "Position2": "*03:XX"
    },
    "Drb1": {
      "Position1": "*04:XX",
      "Position2": "*04:XX"
    }

1 match prediction request ✅
- request ID returned ✔
- no exceptions ✔
- trace logged with request ID ✔
- results uploaded to blob storage with correct file name ✔
multiple match prediction requests - submitted 500 identical requests ✅
- request IDs returned for all requests ✔
- no exceptions ✔
- trace logged with request ID ✔
- results uploaded to blob storage with correct file names ✔
Logs and diagnostics showed that requests were processed in batches, but there was no scaling out of the functions app, possibly due to the service bus connection string not having Manage permissions, and not being able to assess message count on the subscription.
- I modified the connection string accordingly, and re-submitted another 500 requests
- this time the requests were spread over 3 app instances. ✅

Exception testing ✅

1 invalid match prediction request (missing required data) ✅
- returns with bad http request & validation errors ✔
1 match prediction request with invalid HLA ✅
- request ID returned ✔
- HMD exception logged & functions complete successfully ✔
multiple requests - mix of valid and invalid HLA (25 of each) ✅
- request IDs returned for all requests ✔
- logs show all request messages were processed by one instance:
- invalid HLA: HMD exceptions logged & functions complete successfully ✔
- valid requests: results uploaded to blob storage with correct file names ✔
10 valid match prediction request with broken SQL connection string ✅
- I queued up the requests before breaking the sql connection string:
- request runner function completed with error: Error when processing match prediction request. Login failed for user 'match_prediction'. ✔
- request messages eventually dead-lettered ✔
- fixed SQL connection string, replayed messages; completed successfully ✔

Regression testing ✅

Search

search request, valid HLA ✅
- search request ID returned ✔
- individual donor match prediction result files were uploaded to correct location ✔
- final search results uploaded to blob storage with correct name and expected values ✔
search request, invalid HLA ✅
- search request ID returned ✔
- failed search request message and HMD exception logged ✔
search request, HLA contains new allele found in matching HMD but missing from match prediction HMD ✅
- used C*01:02:01:38 in patient which is present in v3490 but absent from v3330:
- search request ID returned ✔
- search results uploaded to blob storage with correct name ✔
- subject with new allele should be unrepresented ✔
- logs showed failed HLA lookup at stage: Conversion of compressed phenotype to target HLA category ✔

Haplotype Frequency Set import ✅

Importing multiple files does not cause any error in the import function ✅
- uploaded 4 files to blob storage at once; 3 succeeded, 1 failed due to missing typing category in file. No unexpected errors. ✔

TESTING PASSED ✅

zabeen commented 2 years ago

Development notes

Message batch size on service bus triggered functions is determined via host.json property: maxMessageBatchSize, which means at present a code change and release would be required to change the value.

This may be a way to set the value via terraform: https://stackoverflow.com/questions/71935339/set-functiontimeout-using-terraform

I don't have time to test this out now, so will raise a new issue to cover this tech debt.

Update, raised new issue: https://github.com/Anthony-Nolan/Atlas/issues/820

zabeen commented 2 years ago

Validation has been completed & passed

Anthony-Nolan / Atlas