SORMAS-Foundation / SORMAS-Project

SORMAS (Surveillance, Outbreak Response Management and Analysis System) is an early warning and management system to fight the spread of infectious diseases.
https://sormas.org
GNU General Public License v3.0
293 stars 143 forks source link

Auto-merge duplicates during entity import #9697

Open bernardsilenou opened 2 years ago

bernardsilenou commented 2 years ago
As a national user or as a server admin, I often have thousands of person-related data (cases, contacts, events) to import in SORMAS in less than an hour. Waiting and watching the screen for a pop-up to decide if it's a duplicate case or person is not realistic for such a large amount of data. The similarity threshold on my system is sensitive enough (> 95% ) for SORMAS to automatically detect all possible duplicates. I want SORMAS to auto-merge the detected duplicates per import job. ### Feature Description

Proposed Change

  1. When importing person-related entities (cases, contacts, event, event participants): 1.1 Add a check box on the import dialogue box close to value separator that I can set to true, to permit sormas to automatically merge all duplicates found during. This should be deactivated by default. 1.2 When merging we can use the most recent modified entity (excluding the one being imported) as the reference entity
  2. When comparing possible duplicates, prioritise computation by comparing UUID (exact match) and if not available use entity-related attributes (based on similarity threshold).
  3. Add a feature configuration for this. The option to auto-merge duplicates would only paper on severs UI that this feature is activated.

    Acceptance Criteria

    I should be able to import 1000 duplicate entities without having to choose the duplicate person or entitiy.

Implementation Details

Additional Information

bernardsilenou commented 2 years ago

@kwa20 @hschanze