Beit-Hatfutsot / beit-hatfutsot-devops

Devops, documentation and issues repository for Beit Hatfutsot - The Museum of The Jewish People
GNU Affero General Public License v3.0
0 stars 2 forks source link

[Task] research duplicated data in family names text #29

Open OriHoch opened 7 years ago

OriHoch commented 7 years ago

reproduction

expected

actual

implications of this bug

TODO

TheGrandVizier commented 7 years ago

This is a scenario that keeps resurfacing and to my knowledge cannot be fixed without the consent of Haim and his blessing on remodeling this content into something better.

Last we spoke about this subject he insisted there are individual names and must each have their own item page, unlike others where a merge is even preferable. There seems to be a difference (that is not understood by me) between varieties that can be merged and varieties that may not be merged.

OriHoch commented 7 years ago

great, thanks, I think we can solve this on our side - the content is exactly the same and we can detect this during the sync process (or at some other stage).

the question is what to do once we detect this duplication and what kind of problems this duplication poses

I guess these are the main problems I can see:

now we need to think how / if to fix it..

OriHoch commented 7 years ago

also, if it popped up in the past, it would be great to know what kind of problems we had with this in the past

TheGrandVizier commented 7 years ago

Just flat out refusal to change anything on the BHP side of things, content-wise. We dropped it at that.

nuritgazit commented 7 years ago

Two things:

  1. I think Haim's issue was that in some cases, there are (minor) differences between articles, while in others the text is the same. It would be helpful to get an estimation on the amount of items in each group.
  2. we know for sure that there are many of them, but only on family names
  3. In terms of product, we should aspire to allow 2 different people, one looking for "Deri" and the other for "Der'i", for example, to get to the unified item, without them having to guess that its the same name.