derekantrican / MountainProject

A scraper and reddit bot for the website MountainProject.com
57 stars 5 forks source link

[BOT] Requested approval for the same area #77

Closed derekantrican closed 1 year ago

derekantrican commented 1 year ago

Here is a recent approval message from the bot:


Possible AutoReply found: 2 EXACTLY matching routes (name & location w/o abbrev). No grades matched

PostTitle: Party in Plumber’s Crack PostURL: http://redd.it/11pzi9m

All results found:

Filtered Result: Plumber's Crack (5.9) Plumber's Crack, Nevada

...

Note that both results are for the same route (https://www.mountainproject.com/route/107185645)

derekantrican commented 1 year ago

It looks like Kraft Boulders (https://www.mountainproject.com/area/105937608/kraft-boulders) has been reorganized recently (for instance, you can see one of the new sub areas - "03-Main Area" - has been only created on March 10, 2023). So when the new subareas were created (and were grabbed in the nightly update from the RSS feed), it created duplicate listings.

For instance, the old path to the plumber's crack route looked like this:

Kraft Boulders > Plumber's Crack > Plumber's Crack

But with the nightly RSS feed updates, there was a new path:

Kraft Boulders > 03-Main > Plumber's Crack > Plumber's Crack

and since we only add areas (and don't delete them) in the nightly updates, we are in a current state where both exist.

derekantrican commented 1 year ago

I think the REAL solution here, is that MountainProject.com should provide ALL updates to areas/routes via RSS feeds (adds, deletes, edits, etc).

Since that probably won't happen, one thing we can do would be to reparse the parent when there's a new child.

Additionally, we could exclude duplicates when requesting approval