SORMAS-Foundation / SORMAS-Project

SORMAS (Surveillance, Outbreak Response Management and Analysis System) is an early warning and management system to fight the spread of infectious diseases.
https://sormas.org
GNU General Public License v3.0
293 stars 143 forks source link

Retrospective duplicate detection & resolving [2+3] #408

Closed ghost closed 5 years ago

ghost commented 6 years ago

@MartinWahnschaffeSymeda We have a problem with duplicate records, where first name, last name repeats itself after picking or matching person. Also when there is a mistake in a letter or word, it saves as a new case, whereas it was only the same case. E.g Taiwo Edito is a case, and another person created Taiwo Edoto, but it is one and the same person due to spelling error. Can we have an algorithm that checks based on 5 key indicator match (first name, last name, date of birth or age, district, region and if possible health facility to match), then call out for duplicates

ghost commented 6 years ago

@MartinWahnschaffeSymeda Please can you include an active/inactive function to all cases so that atleast for now, we can deactivate cases which appear as duplicates after moving case from one lga to another or duplicates. This would go a long way to solve inaccurate case counts due to duplicates. Especially now that we have not yet classified and not a case on the dashboard

MartinWahnschaffe commented 6 years ago

@dta16 Quickfix: there is a delete function in the new version for admins (see #383).

In additon to this we need two things:

  1. A way to further reduce duplicated, by making the name/birthdate comparison more fault-tolerant by applying a similarity algorithm (#432)
  2. Build an interface that allows searching for duplicates in cases and contacts and merging them together. Merging of cases/contacts has to reflect in the data model -> dont delete the merged case/contact!
MartinWahnschaffe commented 5 years ago
MartinWahnschaffe commented 5 years ago

Inspiration:

ghost commented 5 years ago

@MartinWahnschaffeSymeda I like this concept... We should talk about it during our next sprint..next tuesday

MartinWahnschaffe commented 5 years ago

Thoughts:

grafik

Review notes:


MartinWahnschaffe commented 5 years ago

Main todos:

  1. Add logic to return a number of case duplicate pairs based on #757 search algorithm. Likely a list of CaseIndexDto pairs.
  2. Display pairs in tree grid and do page based loading of additional cases
  3. Do merge #1232

Optional todos:

  1. Switch pair entries to have the "better" one first
MateStrysewske commented 5 years ago
MateStrysewske commented 5 years ago
MateStrysewske commented 5 years ago