chaoss / grimoirelab-sortinghat

A tool to manage identities
GNU General Public License v3.0
53 stars 83 forks source link

Sorting hat package creates identity with missing information when merged and unmerged #243

Open code-sleuth opened 4 years ago

code-sleuth commented 4 years ago

When we merge one profile with another, we call sortinghat.api.merge_unique_identities (https://github.com/LF-Engineering/dev-analytics-sortinghat-api/blob/master/app/apis/profiles/apis.py#L190), with a from and to uuid. The from identity is then deleted and added into the to identity.

Afterwards, if we unmerge that identity from the to identity, for which we use sortinghat.api.move_identity (https://github.com/LF-Engineering/dev-analytics-sortinghat-api/blob/master/app/apis/profiles/apis.py#L269), the from identity that was earlier deleted is recreated but it will be missing name, email and other personal details information it previously had.

sduenas commented 4 years ago

Current version of SortingHat doesn't not track historic information to that level of detail, so it's not possible to recreate the previous identity. With the new experimental version (see muggle branch) that would be possible because there's a table which stores all the changes in the identities but currently, there's no code to do that. Not sure if it's something we should support.

What's your use case? Why do you need this feature?

code-sleuth commented 4 years ago

What's your use case?

If there's two identities with let's say two emails that are similar, i think in that case it warrants merging and in case a user thinks the merge was a mistake then they can unmerge.

Why do you need this feature?

Basically giving users the ability to unmerge an identity if they mistakenly merge two identities.

lukaszgryglicki commented 4 years ago

I see that the differences between mungle and master are huge. @sduenas Is it safe to use sortingaht based on mungle branch? Can you please point me to DB structure differences that are required to handle this? I've generated a diff file but it is so huge that it is hard to track the actual changes needed in DB structure. Any chances that you create another branch with unmerging support but rebased to current sortinghat master branch?

Here is the diff file mungle-master.diff.txt cc @code-sleuth

sduenas commented 4 years ago

@lukaszgryglicki, muggle branch is a totally different thing. It's still experimental and not integrated with any other component in the stack. You should not use that branch unless you want to contribute developing it. You have more info about it here: https://github.com/chaoss/grimoirelab-sortinghat/wiki/Roadmap-to-Sorting-Hat-1.0

I can try to rebase the branch to master but as they are incompatible I don't see the point of doing it right now.

lukaszgryglicki commented 4 years ago

OK, thanks for the info. How about changes to DB structure needed to handle merge/unmerge operations?

sduenas commented 4 years ago

In muggle we use Django ORM and not SQLAlchemY - as in master - to deal with the database, so it's not only about DB structure. If you want to check it is in here

In any case, muggle doesn't implement what I think you want, which is an "undo" of certain operations. It's something that it's possible to implement with the current schema because we store all the operations done with SortingHat. So, using event sourcing pattern would be possible to recreate some states or to roll back. This is not implemented yet and I'm not sure if we should follow that direction.

I'm also open to ideas about how to manage these cases plus having PRs solving this issue.