Design database schema - Githubissues

tgittos commented 2 years ago

We know you want to track high scores in a database.

Use this ticket to design your model for the high scores, including all the columns you'll want and their types.

We'll then implement this in SqlAlchemy and create an API endpoint to record scores.

JosephDavidTalbot commented 2 years ago

Table 'Users': (necessary) user_id integer primary key user_name string (optional) user_email string (must be valid email address, must be unique) user_password string (will necessitate learning what password hashing is)

Table 'Scores': word string user_id integer won boolean move_count integer (optional) move_history string

So, getting a leaderboard for a given word would look something like... select Users.user_name, Scores.won, Scores.move_count from Scores inner join Users on Scores.user_id = Users.user_id sort by Scores.move_count asc

tgittos commented 2 years ago

If you want to do auth (which is fine), that's going to be fairly involved. It's more than just hashing a password, you also have to consider:

user email validation (will require email or some kind of outbound communication)
user forgot password flows (same as above)
user reset password flows (same as above, for confirmation emails)
JWT auth between the client and server

There are solutions out there that can handle this for us, but I'm not sure if many of them offer a free tier.

Ones I'm familiar with are Auth0 and AWS Cognito.

If you'd like, you could research a 3rd party auth service and integrate against one, or we can learn to hand roll it (which is useful to learn in and of itself, even if actually deploying a hand rolled auth system is an anti-pattern).

Let me know which approach you'd like to take, and I'll spin out tickets.

tgittos commented 2 years ago

Also, if you want to track the scores of words, you might want to consider making the word an entity itself, and foreign keying score to it:

Word
---
id: int
word: string

Score
---
id: integer
word_id: integer
user_id: integer
won: boolean
move_count: integer
move_history: string

Otherwise you'll have to table scan your entire Score table to find all the rows where the word matches the word you're querying for.

If you do it the way I suggested above, you can give word.word a uniquness constraint, and then do a join to the scores, and your query will be much faster.

JosephDavidTalbot commented 2 years ago

Rolling my own auth solution feels like the sort of thing a skilled web developer should be capable of, and I would like to be able to present myself as one of those. On the other hand, clearly it's a genuinely hard thing to do, and I'm not sure if I actually want to do that.

As for the SQL optimization, that sounds like a good idea. I wasn't aware of joins being faster than searches, but upon reflection, I suppose that tracks.

tgittos commented 2 years ago

Well.

I say we hand roll the auth. It's not that hard, just fiddly. I have a lot of experience doing this.

As for the db optimization, to be clear, it's the index and uniqueness constraint on the word that speeds things up, not the join itself.

Have you heard of a hash map vs a list?

A hash map is a data structure where each item is stored addressable by a unique key. This key management in a classical hash map is internal and usually a function of the value of the data itself. Consider this in comparison to a list, which is just a collection of data you can only index by the numerical position in the collection.

You can consider a database table with an index to be a big hash map, where the index value is the key, and the rest of the columns are the data. The only difference is as a consumer you directly control the data used in the key to index the data.

A table without an index is analogous to a list.

To bring this home, if we have a table with the word as a key, then accessing the rest of the data for that word is as simple for the db as hashing the value of word we give it, and looking it up in it's map. If it finds it, it returns the data, otherwise it returns whatever it's designed to return (empty set).

If we have a table with no index, then the database has to look at each item in the table until it finds one where word is the word you passed in. Could be really fast, could be really slow if the word is right at the end. This is known as a table scan. We want to avoid table scans where possible, without making everything a key - if everything's a key, then nothing's a key and you trade speed for memory issues.

The index approach guarantees a constant speed in fetching the row, as long as it's in the database's working memory set (which is outside the scope of this discussion).

Does that make sense?

JosephDavidTalbot / Wordle-Clone

Design database schema #15