AudiusProject / audius-protocol

The Audius Protocol - Freedom to share, monetize, and listen to any audio content.
https://docs.audius.org
Other
553 stars 106 forks source link

xf filter in get_karma prone to exploit / affecting trending results #2158

Closed 2PacIsAlive closed 2 years ago

2PacIsAlive commented 2 years ago

Affected Service:

Discovery Provider

Issue Description

The xf filter in get_karma appears to act as tool to combat trending score manipulation. However, it is currently implemented in a manner where it fails to accomplish this.

The goal appears to be to filter out users without pfp/cover imgs & bios (high likelihood spam accounts). However, it is looking at the wrong attributes in the Users table to accomplish this.

When the discovery nodes index the chains, the content multihashes are written to cover_photo_sizes and profile_picture_sizes, and their cover_photo and profile_picture counterparts are set to null.

Source: discovery-provider/src/tasks/users.py#L374 https://github.com/AudiusProject/audius-protocol/blob/3482e9f46a53abb71eef64484e25827a31dac268/discovery-provider/src/tasks/users.py#L374

# Any write to profile_picture field is replaced by profile_picture_sizes
if user_record.profile_picture:
    logger.info(
        f"index.py | users.py | Processing user profile_picture {user_record.profile_picture}"
    )
    user_record.profile_picture_sizes = user_record.profile_picture
    user_record.profile_picture = None

# All incoming cover photos intended to be a directory
# Any write to cover_photo field is replaced by cover_photo_sizes
if user_record.cover_photo:
    logger.info(
        f"index.py | users.py | Processing user cover photo {user_record.cover_photo}"
    )
    user_record.cover_photo_sizes = user_record.cover_photo
    user_record.cover_photo = None
return user_record

As a result, when the xf filter is True (as it is in all three trending categories), the end result of the query will yield a null result. Following, the final results of the get_karma function will be Null and will not be applied to the rest of the scoring algorithm.

if xf:
    saves_and_reposts = (
        session.query(
            saves_and_reposts.c.user_id.label("user_id"),
            saves_and_reposts.c.item_id.label("item_id"),
        )
        .select_from(saves_and_reposts)
        .join(User, saves_and_reposts.c.user_id == User.user_id)
        .filter(
            User.cover_photo != None,  # <---- ALWAYS None
            User.profile_picture != None,  # <-- ALWAYS None
            User.bio != None,
        )
    ).subquery()

Consequences:

  1. Trending results are prone to quickly made sock puppets accounts inflating stats to achieve higher ranks.
  2. Underground trending results filled with aged tracks from the main trending. Their decayed scores still end up ousting underground trending tracks because there's no karma multiplier to bolster underground track stats.

Remediation

An easy fix can be implemented by changing the xf query's filter logic to look at the correct table attributes

# discovery-provider/src/queries/query_helpers.py   
...
#L826   User.cover_photo_sizes != None,
#L827   User.profile_picture_sizes != None,

Additionally, the alembic migration creating the materialized view trending_params can be updated in the same manner as well: https://github.com/AudiusProject/audius-protocol/blob/3482e9f46a53abb71eef64484e25827a31dac268/discovery-provider/alembic/versions/92571f94989a_add_trending_views.py#L242

raymondjacobson commented 2 years ago

Hi @2PacIsAlive, you're absolutely correct. Thank you for bringing this to our attention. The profile_picture and cover_photo field are legacy fields attached to few users. We will track this issue and let you know when it has been resolved.

raymondjacobson commented 2 years ago

This was fixed in #2259 !