ga4gh / mme-apis

Documentation for the MatchmakerExchange APIs
https://github.com/ga4gh/mme-apis
34 stars 19 forks source link

Add a “date updated” field to patient to reduce search space for bulk queries #154

Open ramaniak opened 6 years ago

ramaniak commented 6 years ago

Running an all vs all query takes a very long time as we have thousands of samples and have to match them against many thousands in each of the nodes. This search space would be a lot smaller if were able to match against just the samples that have been updated since the last time matching was carried out. We were therefore wondering how feasible it would be to add a flag to each of the patient that details when the patient entry was last updated. If this can then be shared across the MME, it will make matching easier. All the new entries from any side will be matched against all samples. All the old unchanged entries will have to be only matched against the updated entries in the other nodes. I am not sure if this will be useful or if it is a priority and importantly what the best and easiest implementation would be.

thanks Arun

Relequestual commented 6 years ago

I think this comes under filtering. We may not want to expose the actual date the patient was last updated.

We could see if we can start working on a filtering module for MME to make this possible.

To consider: With DECIPHER, if any part of the patient record is updated, the updated on is also updated. So even if there are no new variants, and just a phenotype added or phenotypic modifier, the patient would still count as having been updated. You would need to think about how you represent "new matches" to your users, as it may not actually be a new patient match. Equally, you may not want to miss changes in patient information, such as additional phenotypes.

fschiettecatte commented 6 years ago

We have that exact problem with GeneMatcher with recurring searches. The way I solved this was to create a record of what the 'match' was and check against that when I get a match in a recurring search, if the 'match' is different then there is new data which matched. It was not enough to record that there was a prior match because data could have changed.