c2corg / v6_api

REST API for https://www.camptocamp.org
GNU Affero General Public License v3.0
22 stars 26 forks source link

Remove huts duplicates #225

Closed asaunier closed 7 years ago

asaunier commented 8 years ago

In v5 since routes require at least an associated summit, a summit is automatically created for routes described to access a hut (the hut is then the main waypoint). This "ghost" summit has no text attribute except the title.

As a result, in the v6 database we often have duplicates for migrated huts (1 WP coming from the v5 hut document + 1 WP coming from the "ghost" v5 summit).

For instance "refuge du Couvercle" document_id 104136 (v5 hut) and 38225 (v5 summit).

tsauerwein commented 8 years ago

How can these ghost summits be detected? Check for the same coordinate?

asaunier commented 8 years ago

Same name! and same coords yes.

tsauerwein commented 7 years ago

See https://github.com/c2corg/v6_api/issues/307#issuecomment-258890002

asaunier commented 7 years ago

@tsauerwein said at https://github.com/c2corg/v6_api/issues/307#issuecomment-258890002

Allowing huts as main waypoint is trivial (see https://github.com/c2corg/v6_api/blob/master/c2corg_api/scripts/migration/documents/associations.py#L198-L200), the title_prefix will then be set.

But removing the fake summits (summits with summit type 100) is more complex. It would mean:

  • Finding out which fake summit belongs to which hut.
  • Ignoring fake summits during the migration of summits.
  • Ignoring the history entries for fake summits.
  • Handling associations to fake summits: If there is also an association to the hut, ignore the association (and also association log entries), otherwise create a new association (+ association log entry) to the hut.

@saunier suggested at https://github.com/c2corg/v6_api/issues/307#issuecomment-259082566

What about implementing the document merging and deleting systems (both are required anyway, see #465 and #386)

After the migration we could run a special script that would browse the hut waypoints, detect duplicates, merge the most recent (?) document (this would transfer all associations to the remaining document) and finally remove the merged document (no longer needed) using the standard merging and deleting systems.

tsauerwein commented 7 years ago

@saunier suggested at #307 (comment)

This sounds like the easier option. This could even be done after the go-live.

asaunier commented 7 years ago

This could even be done after the go-live.

Indeed. By the way the merging tool might not be implemented before the golive.

stef74 commented 7 years ago

@asaunier so the can remove blocking ... and move to 6.1 (in fact close ... because is a manual action if job not do before go live ?)

asaunier commented 7 years ago

OK, I have removed labels "blocking" and "migration" and changed the milestone to 6.1.

fbunoz commented 7 years ago

It's more complex than just remove the V5 summit-hut. In the V5, the summit-huts are used to create a route to describe the access to the hut : this route is associated both to the hut and the summit-hut. During the migration, the hut access routes keep all the WP associations, and the summit-hut is defined as main WP (for all routes, the migration define an associated V5 summit as V6 main WP). Then, moreover removing the summit-hut, we need define the V5 hut as the main WP for the hut access route.

asaunier commented 7 years ago

This part requires issue https://github.com/c2corg/v6_api/issues/386 (merging tool) to be addressed first.

asaunier commented 7 years ago

@desnoes @stef74 This issue can probably be closed because a moderator has merged all hut duplicates (by hand!), see https://forum.camptocamp.org/t/refuges-en-double/180662/10