ModalSeoul / Weeb.FM

Last.fm successor. "Weeb.fm" is definitely a temporary name.
MIT License
45 stars 3 forks source link

Autocorrect Implementation Example #28

Open austin-1 opened 8 years ago

austin-1 commented 8 years ago

Solving spelling variations in song titles and artists

Consider the following scenarios that could be scrobbled by a user (note: the actual song scrobbled is exactly the same in all cases):

Artist Name // Song Title 1) Jay Ant // Fully Focused 2) Fly Commons // Fully Focused 3) Fly Commons x Christoph Andersson // Fully Focused (feat. Jay Ant & G-Eazy) 4) Jay Ant // Fully Focused ft. G-Eazy (prod. Christoph Andersson & Fly Commons) 5) Fly Commons // Fully Focused (feat. Jay Ant & G-Eazy) 6) Jay Ant x G-Eazy // Fully Focused 7) G-Eazy // Fully Focused (feat. Jay Ant)

Assume that the following variations have been scrobbled with the artist and song in opposite fields: 8) Fully Focused (feat. Jay Ant) // Fly Commons 9) Fully Focused (feat. G-Eazy) // Jay Ant 10) Fully Focused // Fly Commons x Jay Ant

At the time of scrobbling, each song would be first checked against 'artists' and if it doesn't exist then create a new artist with a unique ID. Then check against 'songs' and if the song title with matching artist ID already exists, if yes then increase its scrobble count by 1, or if it doesn't exist, create a new song with a unique ID and corresponding artist ID. After the new song has been created, a new camp will need to be created as well, with this song being its only member.

If it is the first time a user has scrobbled a unique song, then the song will be given 1 point.

Initially, each song would have a default autocorrect spelling identical to what was scrobbled. Each song would have an autocorrect field containing one or several song IDs belonging to itself and other songs. Each user will have the ability to assign an autocorrect spelling suggestion* for both the song and the artist of the song he has scrobbled.

The following example will assume that each of the ten users above have manually declared a desired autocorrect spelling for each song that was scrobbled. Below is a list of what each song will now be corrected to:

Artist Name // Song Title 11) Jay Ant // Fully Focused (feat. G-Eazy) 12) Fly Commons // Fully Focused (feat. Jay Ant & G-Eazy) 13) Fly Commons // Fully Focused (feat. Jay Ant & G-Eazy) 14) Jay Ant // Fully Focused (feat. G-Eazy) 15) Jay Ant // Fully Focused ft. G-Eazy (prod. Fly Commons) 16) Jay Ant // Fully Focused (feat. G-Eazy) 17) G-Eazy // Fully Focused (feat. Jay Ant) [this user opted not to make an autocorrect suggestion]

Assume that the following variations have been scrobbled with the artist and song in opposite fields: 18) Fly Commons // Fully Focused (feat. Jay Ant & G-Eazy) 19) Jay Ant // Fully Focused (feat. G-Eazy) 20) Fly Commons // Fully Focused (feat. Jay Ant)

A diagram showing how relationships are formed by submitting new autocorrect suggestions points

At the end of it, four camps are generated regarding spelling of this song. All songs that are a member of an individual camp will be considerered the same, with each camp representing a unique song.

Camp 1


1) Jay Ant // Fully Focused 11) Jay Ant // Fully Focused (feat. G-Eazy) 9) Fully Focused (feat. G-Eazy) // Jay Ant 4) Jay Ant // Fully Focused ft. G-Eazy (prod. Christoph Andersson & Fly Commons)

Camp 2


3) Fly Commons x Christoph Andersson // Fully Focused (feat. Jay Ant & G-Eazy) 5) Fly Commons // Fully Focused (feat. Jay Ant & G-Eazy) 15) Jay Ant // Fully Focused ft. G-Eazy (prod. Fly Commons) 6) Jay Ant x G-Eazy // Fully Focused 8) Fully Focused (feat. Jay Ant) // Fly Commons

Camp 3 (Remains unchanged)


7) G-Eazy // Fully Focused (feat. Jay Ant)

Camp 4


10) Fully Focused // Fly Commons x Jay Ant 20) Fly Commons // Fully Focused (feat. Jay Ant)

The next step is choosing a camp leader, which would be done by adding together the number of points and votes. The camp leader is the song with the highest number of points and votes, while all others are considered variations.

Points would be determined by the number of users who have scrobbled the specific song. 1 Point will be given to a song the first time a user scrobbles it. If a user provides an autocorrect title then 1 point will be taken away from the original song and given to the corrected song.

Votes will work similar to reddit with upvoting and downvoting. A point and a vote are weighted the same, and each user is allowed one vote per song regardless if they have scrobbled it before or not.

The page of the camp leader would show a list of variations, while the page of each variation would show a link to the camp leader's page. Biographical info and comments on pages of all spelling variations should be preserved.

A diagram showing an overview of the camps and how leaders/variations would be displayed display

Each time a user saves an autocorrect suggestion, the original scrobble and the correction will be compared to existing camps. After comparing, there will be three possible outcome scenarios for each song:

1) The original scrobble matches a member of an existing camp, but the corrected scrobble does not. 2) The original and corrected scrobbles match a member in the same camp. 3) The original and corrected scrobbles match a member in two different camps.

Note: When a unique song is scrobbled, a new camp is created for it. Therefore it is impossible for the original scrobble not to match any camps.

Scenario 1: A new song is created for the corrected scrobble with 1 point, the new song is added to the camp of the original scrobble and the original scrobble loses 1 point.

Scenario 2: The corrected scrobble gains 1 point and the original scrobble loses 1 point.

Scenario 3: The corrected scrobble gains 1 point, the original scrobble loses 1 point, and all members of the original scrobble's camp are moved into the corrected scrobbles camp. The two camps are merged.

ModalSeoul commented 8 years ago

It's getting late now, and while I do have a bit to say about this, I'm going to wait to comment until tomorrow mid-day or so. That gives me some time to talk with the other guys and see if we can whip something up.

Leaving this here now so you know we're not ignoring you.

austin-1 commented 8 years ago

Thanks Modal, probably will take a while to get all of the small details worked out anyway so no rush.

Amendment A stub here to address the issue of rogue users attempting to provide an autocorrect suggestion that points to an unrelated song.

User scrobbles: The Beatles - Here Comes The Sun

Suggests autocorrect to: Guns N' Roses - Paradise City

How to make sure his suggestion doesn't cause unrelated camps to merge.

Rough solution idea: Only the most frequently suggested autocorrect suggestion actually matters when determining camps. If Here Comes The Sun already has 100 users who say that song is correct and 1 user who says it should be Paradise City, Here Comes The Sun is going to be what is considered when matching camp members.


Going backwards from the post on 11/19/16 and addressing the initial concerns with the new information.

What's true here is that Here Comes The Sun would receive a db entry in its "edited to" column pointing to Paradise City, just like that song would get an entry from the first song in its "edited from" column. But one time isn't a pattern. Maybe three times isn't even a pattern here. I'm not sure what will define a pattern yet, the best way might be to just wait and see.

And even if the system did somehow think it was a pattern, it wouldn't cause any huge wave of changes if it only triggered a suggestion to make an edit.

austin-1 commented 8 years ago

Amendment B stub here to address the issue of rogue users attempting to set the stage for malicious autocorrection by giving suggestions for unique songs which are going to be popular in the near future.

The threshold for global autocorrection should not be based on only one user. There could be a scrobbler error and a user edits the scrobble to what they were actually listening to, but that's not in any way a pattern of corrections for either of the track names involved.

But also as a precaution against this type of massive erroneous autocorrection that's really why I favor offering a 1-click suggestion to autocorrect rather than just having the system do it. It can still ask the user if they want all past and future scrobbles by that name corrected to X-Y. That allows them to enable true autocorrect on a case-by-case basis and turn it off or edit the correction later if they change their mind.

austin-1 commented 7 years ago

Sorry I've had to step away from this for a little bit but for now I'm thinking maybe the best way to handle this is to turn on scrobble editing, store the edits as part of the track info (like edited to and corrected from), and then over time the data will show what determines a pattern and defining what autocorrect eligibility means can be established over time.

There will likely be many factors that determine a correction pattern but my guess is the key part will be the relationships between songs that can be traced upstream. After its established that a song or a somehow related song has to be corrected x times before the system automatically recognizes an autocorrect pattern, it could either be corrected automatically or just offer a suggestion to make a 1-click correction on the recent scrobbles page.

Once the autocorrect system has reached a point of maturity that it has a trusted accuracy and a lot of data to work with, then we can start thinking about how to visualize and display those related songs (same song but various spellings) on track pages in a way that can signify which one is the leader of the pack.