Anthony-Nolan / Atlas

A free & open-source Donor Search Algorithm Service
GNU General Public License v3.0
9 stars 5 forks source link

Search results return results with more mismatches than allowed in search request #891

Closed mmelchers closed 1 year ago

mmelchers commented 1 year ago

Describe the bug As a search coordinator I expect that when I perform a 1 mismatch search (e.g. a 9/10 search) that the search results only contain results with 0 or 1 mismatch and not with more mismatches. This is not the case as there are search results returned with more than 1 mismatches.

To Reproduce Steps to reproduce the behaviour:

  1. Start a 1 mismatch search in ATLAS
  2. Wait for results to be ready
  3. check the MatchProbabilitiesPerLocus.OverallMatchCount for each record
  4. some of these records will have a value of n-2 when when performing a n-1/n search

Expected behaviour

zabeen commented 1 year ago

I discussed this with @mmelchers - this is due to Atlas generating two different match counts.

The first is generated by the matching algorithm, and is based purely on P groups. This could be termed the "potential match count". This should never be less than whatever cut-off has been provided within the search request, e.g., if only 1mm allowed, it should never return a potentially matched donor with 2 or more mismatches. This is working as expected.

The second count is generated by the match prediction algorithm, and is calculated using predicted match grades. This is the match count mentioned in the ticket, and could be termed the "predicted match count".

It is expected that at times, the predicted match count will be lower than the potential match count, due to the extra information provided by the HLA haplotype frequencies.

The solution to this issue is simple: add a filter that removes donors from the final search result that have too many predicted mismatches. The question is, should this happen within Atlas, or be the responsibility of the consumer service?

I would prefer that the consumer make this decision, as it's possible that another user would like to see all results, and not implement this filtering by default. We have already seen what happens when an algorithm makes various filtering decisions that are not transparent to the end-user: it leads to a lot of confusion when results are compared to that from another algorithm.

I would also like to avoid adding another parameter to search requests model to toggle such a filter, as it is already quite complex.

In general, Atlas is designed to give the consumer service as much information as is useful about the potential match list that it generates, to enable the consumer to make its own decision about filtering and/or ranking. And so it is recommended for WMDA search and match to implement this simple filter within the consumer service.