OpenTechStrategies / lisc-ttm

LISC TTM code. See https://ttm.lisc-chicago.org/.
GNU Affero General Public License v3.0
1 stars 4 forks source link

Better UI for detection of duplicate records (and elimination / merging of same). #36

Open kfogel opened 10 years ago

kfogel commented 10 years ago

Both TRP and LSNA have expressed a desire to be able to detect (and eliminate / merge) duplicate records. Usually these are records of participants; for example, Sueily at LSNA mentioned that duplicate detection would improve this page:

https://ttm.lisc-chicago.org/lsna/participants/new_participant.php

Filing this as one unified issue for now. If it turns out that very different duplicate-detection methods are needed for different places, then we can break this out into sub-issues, but I hope we can implement a largely shared mechanism.

cecilia-donnelly commented 9 years ago

@cwebber, this might be an issue to check out if you have time and interest. This is definitely something that users across the subsites want.

cwebber commented 9 years ago

Looking into it.

cwebber commented 9 years ago

@kfogel To clarify: do you mean detecting duplicate issues on submission, or detecting while sorting through and curating existing lists of data?

It would be helpful to know:

cecilia-donnelly commented 9 years ago

@cwebber

There is actually already a rudimentary example of this process in the Bickerdike subsite. See bickerdike/users/all_users.php. Users are meant to search for an existing record, then either delete it or search for another record to merge it with. All the duplicate detection and matching is done by the user, in this example.

I hope that helps! Let me know if it isn't clear.

cwebber commented 9 years ago

That helps a lot, @cecilia-donnelly! Thanks!

cecilia-donnelly commented 8 years ago

I talked about this with @kevinrak9 and @kfogel again today. The UI I discussed in this earlier comment https://github.com/OpenTechStrategies/lisc-ttm/issues/36#issuecomment-67038230 can actually be simplified, at least for Enlace's needs. We discussed adding a checkbox next to each name with merge button at bottom of list on the "Participants" page, when search results show up. Most of the time, Kevin said, users will just see two identical names next to each other and will be able to choose to merge those. We'll create an interface for choosing which values to retain in the eventual combined user.

cecilia-donnelly commented 8 years ago

This would take:

cecilia-donnelly commented 8 years ago

Talked with @kfogel about this one and decided that the thing to do is to combine the first two bullet points above. We don't need to offer a UI for combining surveys, program/session membership, and attendance because the participants won't have conflicts there (they may have redundancy, but that can be managed from the "merged-to" participant profile). So, the workflow will be:

  1. Find the duplicate participants in the search results.
  2. Choose one as the "merged-from" participant and the other as the "merged-to" (new master) participant.
  3. Confirm participant metadata. This will be a new page that displays the name, phone number, etc. for both merged-from and merged-to participants on which the user can update the merged-to participant with the merged-from data. Each metadata field will have an "update" button with an arrow that pushes information (onclick / on submit) from the merged-from participant to the new master (merged-to) participant.

The other data will be assigned to the merge-to participant automatically. Later, we could add an interface for re-checking the metadata against the merged-from participant, if desired. The work for removing the "merged-from" / deprecated participants will be as described in https://github.com/OpenTechStrategies/lisc-ttm/issues/36#issuecomment-170679192.

kevinrak9 commented 8 years ago

@cecilia-donnelly I think the process you described would be great for what we're looking for. Also would deprecated participant records be deleted, or moved to a separate column? I wasn't sure.

cecilia-donnelly commented 8 years ago

Hey @kevinrak9! Glad this process sounds right to you. I think we'd mark the deprecated records, not delete them. So, we'd add a new column to the Participants table called merged_to or similar. The value of that column would be null for non-deprecated (regular) participants. When two participants are merged, the merged-from participant's merged_to column would be updated with the id of the new master participant.

Say I have Cecilia Donnelly with ID of 41 and Cecelia Donnelly with ID of 77, and I want to mark them as duplicates and keep participant 41 as the master copy. I'd go through the workflow above and when I was done, participant 77 will have a merged_to column with a value of 41. Participant 77 would still have null in her merged_to column.

Once we have that in place, we'd update the various places where the code checks the Participants table so that the queries exclude the deprecated participants -- that is, those whose merged_to column has a non-null value. So, I'd only see Cecilia Donnelly, participant 41, in the list of search results, but participant 77 would still exist in the db -- we might think about adding a new "all deprecated participants" export, or similar.

Does that help?

kevinrak9 commented 8 years ago

Yes, that helps - thanks for explaining!

cecilia-donnelly commented 8 years ago

:+1: