derat / yambs

Moved to codeberg.org/derat/yambs
https://codeberg.org/derat/yambs
BSD 3-Clause "New" or "Revised" License
4 stars 0 forks source link

Be less aggressive about mapping URLs to artists and labels #26

Closed derat closed 1 year ago

derat commented 1 year ago

I'd like to improve the logic around mapping URLs to entities (e.g. https://foo.bandcamp.com/ to an artist or label) when seeding from Bandcamp or Tidal.

Right now, if a single entity has a relationship with the URL, then the entity is used when seeding. If multiple entities have relationships with the URL, then the one with the shortest edit distance between its name in the database and on Bandcamp/Tidal is used. The "credited as" field is never set, so the seeded field just shows the name as it appears in the database.

This behavior is annoying in cases like this one:

Or this one:

Setting the "credited as" field would probably make things worse in both of these cases, as I believe that it will hide the incorrect credits from the editor -- in both cases, the page would have a green field showing the name as credited on Bandcamp but actually linking to A's MBID.

I think it'd be better to only seed the MBID when the names on the page and database are very similar (edit distance of 1 or 2?). Any bigger differences seem like they probably need a human's attention to e.g. split the credits into multiple artists or consider creating a new artist. Just to mention it, I should require an exact match for very short names to handle cases like the A and B one I made up above (since the edit distance between those strings is just 1).

I think it's probably okay to still leave the "credited as" field blank; online sources are a mess and it's probably safest to stick with the DB name.

I suspect that there will be some cases where this change would result in a match not being made where it actually should be (e.g. an artist name is stylized in a weird manner on Bandcamp, which I've seen often), but it's not the end of the world if the editor needs to click the search button and manually select the appropriate entity.