LPGameDevs / EditarrrPublic

Public repo for level editor tools in Unity
MIT License
11 stars 2 forks source link

Improve validation and deduplication in backend lambda. #98

Closed yanniboi closed 10 months ago

yanniboi commented 1 year ago

We want to add some checks to prevent people spamming and overpopulating the databases with unnecessary data:

Levels

Scores

Ratings

HaywardMorihara commented 1 year ago

Units of Work:

Prioritized

Other Ideas:

Split out into separate issues:

Implementation Details

Level Pagination

When querying "all levels" we want to paginate the requests (maybe a max of 10 at a time?)

I think we already have this actually: https://github.com/LPGameDevs/EditarrrPublic/blob/develop/backend/lambda/function.mjs#L126

Clarified: We have a limit but NOT pagination (the ability to say, "Give me the second|third|fourth|etc batch of levels")

Levels by User Filter Option

So we can support "My levels" and other cases.

It would have to be based on whatever "username" the current user has? Or some ID. Maybe support both query options.

Levels by Name Filter Option

Level Recently added sorting options

To be clear: we already sort by "last updated". So this is to sort by "last created"? And we can easily support ascend vs descend.

Level Recently added filter option

Again, just clarifying: this is to limit the search to "levels created before/after X"? And/or "levels updated before/after X?" All of the above should be easily doable.

Level Rating/Score Sort/Filter options

This would mean that we would need to average all the ratings for each level, so we could then sort by it, right? I could see this being a bit tricky....off the top of my head, we could:

  1. Calculate on WRITE: Calculate the rating average for a level and update it on the Level data itself whenever a rating is submitted.
  2. Calculate on READ: Calculate the averages for all the levels at read time, then sort by it. This seems very expensive and not really any simpler than "On Write". The advantage is a miscalculation of the average won't persist and would be an easy bug fix (fixing it for "on write" would involve overwriting the miscalculated value).
  3. Calculate async: Have some process in the background that calculates the average.

Approach (1) Calculate on WRITE seems like the simplest to me.

Query for Top X Scores

When querying scores we want to return the 5 highest (smallest time) scores

When querying scores

Querying scores for a single level, correct? Or, for a player?

Assuming it's for a level, this seems pretty simple with the scoreLevelName-score-index GSI.

Scores: De-Dupe Per User

Return one score per user (best score only) to prevent a user taking up the whole leaderboard with their 5 best scores.

Update: We should keep track of the best score per user per level, so it doesn't have to be calculated dynamically

This is maybe a bit tricky. Two (maybe 3) approaches I can think of off the top of my head:

  1. DynamoDB might have some query options that could make a query fulfills the reqs (scores with unique user IDs sorted by highest), but I'm skeptical that this exists
  2. Assuming we want a page size limit of 10, we query for the top 10 scores, de-dupe, then query for more if de-duping resulted in <10 scores. Alternatively, query an excess # (100 or something), then de-dupe and limit the results, querying for more if necessary. A bit of a brute-force approach and maybe a bit slow if a single user has a lot of the high scores.
  3. We could add a sk: USER#<userId> to the level (or the scores) table and accumulate score (and even rating) data about the user per level in that object whenever there's a score/rating update. This has discrepancy/maintenance risks of de-normalized data.

I think I would start with approach (2) (assuming (1) doesn't exist). If we find that isn't performant enough (I'm assuming it'll be fine given our small scale of data), we could look into (3).

Thought:

Between this and #10, this makes me consider whether we might actually want to restructure the DDB tables so that instead of two pk: LEVEL#<levelId> sk:RATING#<ratingId>/sk:SCORE#scoreId> tables, we have a single table with pk: LEVEL#<levelId> and sk: USER#<userId>, then store ratings and scores on the "user" objects per user per level (this makes de-duping of ratings/scores per level per user easier to accumulate).

However, I don't feel confident enough that this will be beneficial over time to put in the effort to make this change at the moment. But I could pursue this if you think we should.

Ratings: Ensure One Per User Per Level

There should only be one rating per level, per user. If someone leaves an additional rating to a level that was already rated we want to update the previous rating rather than create another.

This should be pretty easy to do by doing an "upsert" operation by checking, and creating if there isn't already a rating for the user for the level, and doing an update otherwise.

HaywardMorihara commented 1 year ago

Notes from our discussion phone call:

Decisions

Have the query/filter/sort logic in the server (the alternative is have the game fetch all the levels upfront and do the query logic). Reason being: (a) I can help if its in the backend (the logic has to live somehwere) (b) it scales better (not expecting this to be a priority but a small benefit)

For (9), keep track of the best score on writes

Clarification

(1) we have a limit, but no pagination logic

Priorities

We aren't certain exactly which filter/sort options would be ideal, but the ideal is to create the best player experience and facilitate level discovery. We don't want the player to be overhwlemed with all the levels, but we want to help them find the "right ones" oto play/try. We'll get more player feedback soon (early Nov release), but we're getting ahead of the curve now, so I'll use my best judgment about priorities.

Along these lines, we should prioritize changes that affect the data storage (those will be harder to introduce later on when the game is live). This means work such as calculating a cached average score for a level should be prioritized

Scores for leaderboards probably the highest priority (most impactful upfront)

Future

For now, the strategy is: AWS is the data store (maybe Steam someday)

We're going to want to give awards on various measurements, but not sure what those are yet. Ideas include highest rated level, most played, player who played the most, most "competitive" level, etc. To that end, we should keep any data we think is interesting or relevant, so we can give awards on these various vectors

We'll optimize in the future (costs, performance, etc)

We are NOT storing user data right now, because (a) the overall ethos of making the game accessible & not requiring accounts/login (b) we aren't certain what we want to do with "users" in the future (c) we can always create a virtual user by querying all the data pertaining to a any userId/username and aggregating the data together

Follow-Up Thought

One thought we only touched on briefly, but I'm thinking is something worth adding to the list is the ability to "search" for a level - by name or creator or something like that. I could see there being a use case for "hey, I know yan made a level and I want to play it, give me all the levels created by player 'yan'". I've added this as an item to the Units of Work list (2 and 3)

CalmPewter commented 1 year ago

After going through all this, I only have a few additional suggestions for filtering options: -Remove all levels a user has previously downloaded -Remove all levels currently downloaded (present on local machine) -Remove all levels a user has beaten (could be tied to score submission)

Keep up the great work, Backenders! 🥇👍

HaywardMorihara commented 1 year ago

@CalmPewter Thank you for the feedback!

  • Remove all levels a user has previously downloaded
  • Remove all levels currently downloaded (present on local machine)

Implementation-wise, I think we would most likely want to do this filtering client-side (in the game itself), because the client knows what is downloaded (I don't think we should have the backend DB keeping track of that). So the Editarrr game fetches all the levels, then filters out ones already downloaded (and can make a subsequent request(s) for more levels if necessary).

I'm not going to add this to the list I have here since I'm keeping this issue backend focused, but maybe we can/should create a separate issue for adding this filtering in the game?

  • Remove all levels a user has beaten (could be tied to score submission)

Good idea, I'll add this to the list!

HaywardMorihara commented 10 months ago

I'm closing this Issue as "Done" - split out some of the remaining tasks into separate Issues: