Predicted Mismatch position within locus indication is not always correctly returned

mmelchers commented 1 year ago

Describe the bug It seems to be the case that the MatchProbabilitiesPerLocus.*.PositionalMatchCategories always has position1=true even when position 2 is mismatched.

To Reproduce Steps to reproduce the behavior:

Start mismatch search (example is search 69f97644-dd43-43bb-a3bd-1e413e6a2072 on prod)
Find a donor with a mismatch
check value for position1 and position2

Expected behaviour Depending on which allele is mismatched position1 or position2 is set to true

Actual behaviour Position1 is always set to true even when the mismatch is obviously on position2. Example: "AtlasDonorId": 33514456, "DonorType": "Adult", "DonorCode": "4420215606", in the above mentioned search

Atlas Build & Runtime Info (please complete the following information): Issue is present in version running at WMDA at 13 June 2023

Additional context Add any other context about the problem here.

zabeen commented 1 year ago

@mmelchers looking at the code, it is clear that the two predicted match grades are being assigned arbitrarily: https://github.com/Anthony-Nolan/Atlas/blob/master/Atlas.Client.Models/Search/Results/MatchPrediction/MatchProbabilityPerLocusResponse.cs

It is because we determine the grades based on overall probability of 0, 1 or 2 mismatches per locus - we don't have info on probability of a particular position being mismatched. On that basis, I am not sure how to resolve this within the scope of the match prediction component.

One thought is to use the scoring results, because they would inform you of an obvious P group mismatch. I.e., for each locus, when one of the predicted match categories is Mismatch, and other Potential/Exact, then use the scoring result to determine the category orientation (if scoring result position 1 is mismatch, then predicted match category position 1 should be Mismatch, else if scoring result position 2 is mismatch, then predicted match category position 2 should be Mismatch, else do not reset values, as it's not clear which is the mismatch).

I think I could do this during the search combination step. BUT - it would only work if scoring has been run at the locus, so search and match would have to request scoring at all loci (and ensure to exclude DPB1 from aggregation) to make use of it.

mmelchers commented 1 year ago

Hello Zabeen. Thank you for finding the cause of the unexpected behaviour. I do see some major issues with your suggestion.

A major benefit of ATLAS is that it returns the probability based match class. This means that donors can be classified according to the match class they realistically have, not what technically the match class could be.

If I understand you correctly, you are suggesting to start using the scoring based mismatches. We can do that well for many search results, except those where the scoring based match class does not correspond to the match probability based match class. In that case you could have a scoring based 10/10 but probability based 9/10 but because according to the scoring there is no known mismatch, you are not returning a mismatch position and therefore WMDA Search and Match cannot indicate which allele is mismatched even though it is in the 9/10 match class. This will be very confusing for search coordinators.

Another option would be to use the scoring based match class, but then there will be many donors that are an obvious mismatch, but are still returned under the 10/10's. This is also very counter intuitive for search coordinators.

Is there any way to return which locus is actually mismatched when a probability based match class is used?

zabeen commented 1 year ago

you could have a scoring based 10/10 but probability based 9/10 but because according to the scoring there is no known mismatch, you are not returning a mismatch position and therefore WMDA Search and Match cannot indicate which allele is mismatched even though it is in the 9/10 match class

You are correct, in this particular case, the positional assignment of the predicted mismatch grade (e.g., should Mismatch be applied to A_1 or A_2 position, if the single mismatch was predicted to be at locus A) will be entirely arbitrary, because the potential grade at both positions (as determine by scoring component) will be one of the non-mismatch types.

The question that Atlas is currently answering is: "what is the probability of locus X having 0, 1 or 2 mismatches?".

But in wanting to know the orientation of predicted match grades, we are actually asking two different questions: "What is the probability of position 1 of locus X being mismatched?" and "what is the probability of position 2 of locus X being mismatched?". This is a different calculation than the one being performed at present.

The solution I presented above is a workaround to avoid search coordinators being confused where it is obvious where the mismatch is, based on HLA alone, and I believe that is sufficient to close the reported bug.

However, if it turns out we need greater resolution of match predictions by asking "calculate match probability by position" and not by locus, then that is a significant change of algorithmic nature, which we will have to plan.

This second enhancement may not be needed though, it all depends on what end-users want to know. Do they care about which of the two alleles is mismatched? Or only that there is a mismatch at all.

zabeen commented 1 year ago

AN Testing

Can only perform regression testing within AN dev - will test the feature more fully within WMDA dev.

@DmitriyShcherbina

edited

please run 10/10 searches with match prediction on

one search with scoring at every locus (and dpb1 excluded from aggregation)
second search with no scoring

In all cases, every locus within MatchPredictionResult.MatchProbabilitiesPerLocus should have two values for PositionalMatchCategories.

zabeen commented 1 year ago

@DmitriyShcherbina I have implemented a fix for the null issue you spotted, the ticket is ready to be re-tested, thanks!

DmitriyShcherbina commented 1 year ago

@zabeen Testing status: Ok

zabeen commented 1 year ago

WMDA Testing Notes

Once the change has been deployed to WMDA dev, need to re-run the searches where this bug was observed and check the orientation of predicted match grades.

Use Atlas API directly, as search and match frontend may be updated to use scoring match grades, after discussing the issue with @mmelchers.

zabeen commented 1 year ago

WMDA Testing

Currently failed as it is possible that scoring is also assigning match grades arbitrarily, as indicated by this old issue, #651, and fix for #996 is predicated on scoring grade orientation being non-arbitrary.

zabeen commented 1 year ago

Update: investigation into #651 shows that grades were not being assigned arbitrarily but they were instead aligned to patient typing, not the donor typing, which WMDA expects.

New ticket #1012 aligns scores to donor typing, which should automatically fix this issue.

zabeen commented 1 year ago

WMDA Testing

Passed - see testing for #1012 (link to testing comment)

Anthony-Nolan / Atlas