Closed bernardsilenou closed 4 years ago
@MateStrysewskeSym Still, i do not know how a and b can be combined t and compared with the threshold set on the server? Do we just compare only the value of a with the threshold since the others are perfect match?
@bernardsilenou I suggest we talk about this in detail once we are able to prioritize it. It's definitely a good idea.
OK, I for got to add date of report
Can we please also add date of birth? This would be relevant for the German system, as they do not work with the social security number. Or should I create a new issue for this?
@max-hzi DOB is already included now but only considered if it is not missing in both cases
Thanks a lot for the quick response dear Bernard!
optimal
Consider refinement in #1644 when implementing this issue. Here is the refinement on duplicate person detectiion: @Iheanacho2027 similarity of cotantact has 2 levels: 1, Similar contact person: This uses only the first name and last name for now, We can latter add variables linked to the person like sex and DOB/age. We do not considers region and district at this stage of identifying similar person. I do not think we should also limit the search only to persons thath the user has access to because this will prevent us from identifying contacts outside the jurisdiction of the user. I suggest we go with this: Only take persons into account that have exact (not exact but high similarity match, say >90% and can be changed by user) first name and last name combined, in collaboration with age/ DOB, gender and National health ID. This will mean creating another parameter to determine the cutoff limit for similar person detection in addition to the parameter that we have for similar contact detection. This will prevent the system from suggesting many persons to the user but will not improve search time much since we still have to search the whole person table. This issue is related to #2178
2, Similar contacts: For this, the system will check for similar names, regions, districts, age, sex, National health ID of all the contacts related to the source case.
@bernardsilenou having only level one running for Nigeria in the contact detection module would be very helpful if it combines both the first name and the last name when checking for similar contacts. I see alot of situations where the system only checks with one variable name and brings out about 40 - 100 names that have zero relation to the contact which is about to be imported i.e it doesnt make sense if the system is using age and sex to check for similarity when both first name and last name combined isnt the same, it slows down sessions like import
@Iheanacho2027 Level 1 checking should include both first and last name. For example "john man" and "john michael" should not be suggested as duplicates by the system. If only one name is used, please create an issue, it might be a bug. If both names are used but many false duplicate names a suggested by the system, then we need to increase the threshold level to say 0,8 or 0,9. This threshold value is not fixed, we have to play with it as you use the system in order to find a comfortable cutoff.
For duplicate contact detection, we need both 1 and 2, point 1 only is not enough
I did a manual simulation test today on sormas.symda server and also using R, by calculating the distance between 2 strings using the qgram method with q = 1. Some few points i found:
@MateStrysewskeSym I suggest this improvement for duplicate case detection: We can do a 2 sage method: stage1: filtering
stage2 : applying similarity measure
For duplicate contact detection: we do as we already do: Identify duplicate person using stage2 and identify duplicate contact using stage1.
Problem Description
To improve the sensitivity of our duplicate detection of case, person and contact entities, we should add the social security number SSN to the person identifiers. For countries that uses such a system, the SSN is a unique identifier of the person, even stronger than the name of the person. Our old duplicate case detection is defined in #757
Feature Description
Proposed Change
a.) national health ID b.) with positive check for passport number then we show duplicate, if not we ignore it
Possible Alternatives
Additional Information