Cobresun / movie-club

🍿 Movie Club Website!
https://cobresun-movie-club.netlify.app
0 stars 1 forks source link

RFC: Migrate away from FaunaDB #152

Open brian-norman opened 8 months ago

brian-norman commented 8 months ago

Preface

We hate Fauna! The FQL language is confusing and not one we can get a lot of help from the wider community with (and therefore not ChatGPT friendly). We never even used it's big selling feature, the GraphQL support. Let's move away from it...

We can choose to go with another document based DB like MongoDB, or we can go with a relational DB like PostgresSQL. I'm personally leaning towards Postgres because I think it would have the most external support long term. There is a cool serverless platform called Neon that looks to have a good free tier, and a much more reasonable paid tier compared to Fauana. Both Neon and Postgres are open source, which I like :)

A Second Chance

Around two years ago, I made a bunch of commits straight to main, to merge the watch list, backlog, and reviews documents into a single club document. This was to support multiple clubs. That was a choice. @cole-adams claims that he'd never have let that merge if it was a PR, and he's right, I wish he stopped me in my tracks! In my defense, two years later, we have 4 actual clubs (+ 1 test club) using the site, and the schema has yet to fail us 😛

If we move away from Fauna, we can use it as a second chance, to set ourselves up a little better for future work, and create less of a maintenance nightmare.

Looking Ahead

Here are a few things we'd want to have, if we started from scratch:

  1. We need a schema that allows us to scale up to many clubs without it being crazy expensive, which means sending less data across the wire every time we do fetches
  2. Along those lines, we probably want to be able to paginate the lists (reviews, watchlist, backlog) because the Cobresun club has gotten unwieldy and that is making things expensive
  3. Ideally we adopt a solution with backups support
  4. It would be nice if we had a dev database that allowed us to make changes without messing with production data
    • Also some system for doing actual migrations to add fields and such, again keeping in mind that we are no longer the only users
  5. We want to support media that isn't movies at some point (games, books, etc.)
  6. We want to make it easy for people to create new clubs through the UI, and add/remove members from their clubs
    • We should probably give the member who creates a club admin privileges for that club, so they can add new members, delete members, and rename the club, or even delete the club

A New Schema

Here's one proposal, which I want to run by the gang this time, so we can actually come up with something together that sucks way less than our current schema. It's a Postgres SQL schema, which is a significant departure from our current model, but I think that it should theoretically set us up to do all the future improvements we desire.

CREATE TABLE members (
    member_id SERIAL PRIMARY KEY,
    username VARCHAR(50) UNIQUE NOT NULL,
    email VARCHAR(100) UNIQUE NOT NULL,
    -- Add other member details as needed
);

CREATE TABLE clubs (
    club_id SERIAL PRIMARY KEY,
    club_name VARCHAR(100) NOT NULL,
    -- Add other club details as needed
);

CREATE TABLE club_members (
    club_member_id SERIAL PRIMARY KEY,
    club_id INT REFERENCES clubs(club_id),
    member_id INT REFERENCES members(member_id),
    is_admin BOOLEAN DEFAULT FALSE, -- Indicates whether the member is an admin
    -- Add other club member details as needed
);

CREATE TABLE works (
    work_id SERIAL PRIMARY KEY,
    title VARCHAR(255) NOT NULL,
    type VARCHAR(50) NOT NULL, -- 'movie', 'book', 'game', etc.
    release_year INT,
    tmdb_id VARCHAR(50), -- External ID for movies
    -- Add other work details as needed
);

CREATE TABLE club_reviews (
    review_id SERIAL PRIMARY KEY,
    club_id INT REFERENCES clubs(club_id),
    work_id INT REFERENCES works(work_id),
    review_date DATE,
    -- Add other review details as needed
);

CREATE TABLE review_scores (
    score_id SERIAL PRIMARY KEY,
    review_id INT REFERENCES club_reviews(review_id),
    member_id INT REFERENCES members(member_id),
    score INT,
    -- Add other score details as needed
);

CREATE TABLE watch_list (
    watch_list_id SERIAL PRIMARY KEY,
    club_id INT REFERENCES clubs(club_id),
    work_id INT REFERENCES works(work_id),
    -- Add other watch list details as needed
);

CREATE TABLE backlog (
    backlog_id SERIAL PRIMARY KEY,
    club_id INT REFERENCES clubs(club_id),
    work_id INT REFERENCES works(work_id),
    -- Add other backlog details as needed
);

CREATE TABLE awards (
    award_id SERIAL PRIMARY KEY,
    club_id INT REFERENCES clubs(club_id),
    year INT,
    -- Add other award details as needed
);

CREATE TABLE award_categories (
    category_id SERIAL PRIMARY KEY,
    award_id INT REFERENCES awards(award_id),
    category_name VARCHAR(100),
    -- Add other category details as needed
);

CREATE TABLE award_nominees (
    nominee_id SERIAL PRIMARY KEY,
    category_id INT REFERENCES award_categories(category_id),
    work_id INT REFERENCES works(work_id),
    -- Add other nominee details as needed
);

CREATE TABLE award_votes (
    vote_id SERIAL PRIMARY KEY,
    nominee_id INT REFERENCES award_nominees(nominee_id),
    member_id INT REFERENCES members(member_id),
    -- Add other vote details as needed
);

Next Steps

I am not married to this solution, I wanna know what ya'll think, and if you agree that it's even worth doing. If so, we can come up with a project to do this migration, which I'm pretty sure will be no easy feat...

We'd want to minimize disruption to the other users, and not lose any data in the process. Ideally in this process we also further decouple our frontend and Netlify functions from our choice of database, so future migrations are less scary!

cole-adams commented 8 months ago

Thanks for the investigation @brian-norman! After taking a look at Neon, I think it's a great solution to our problem. We do have a complex enough structure now that a relational database would be useful. My only concern with Neon is the speed of the free tier. Based on my reading, it seems that the free tier is based on a shared compute, so we would be affected by high load on Neon servers themselves. I'm wondering if it would be worth running a quick prototype just to get a sense of the baseline free tier speed.

However, if we do plan on monetizing this product in the intermediate term, the free tier limits might not be a concern as we would have our own (semi) dedicated compute with the pro tier.

If we do go with the relational database (which I am also leaning towards now after learning about Neon), I have some suggestions for the proposed schema. Please let me know your thoughts:

1. Lists

As it sits right now, we have 4 different lists of works: Reviews, Watchlist, Backlog, and Nominations. I'm wondering if instead of having different schemas for each, we consolidate those into one schema with a type property? Something like:

CREATE TABLE lists (
    list_id SERIAL PRIMARY KEY,
    club_id INT REFERENCES clubs(club_id),
    list_type VARCHAR(50), -- Types like 'watch', 'backlog', 'reviews', etc.
    -- Add other list details as needed
);

That include items:

CREATE TABLE list_items (
    item_id SERIAL PRIMARY KEY,
    list_id INT REFERENCES lists(list_id),
    work_id INT REFERENCES works(work_id),
    created DATE
    -- Add other item details as needed
);

I think this would give us more flexibility if we ever needed to add different lists like seasons or favourites etc.

With this model, the review_scores table would likely have to stay, instead referencing a list item. There are ways of dynamic properties, but I think that just complicates matters.

2. Club Member Admins

As we start adding features for users to manage their own clubs, I see a situation where we might need to give the users more granularity in the permissions they're giving members. Because of this, I think it would make sense instead of having a is_admin property in the club_members table, have a role property, which can take on various values like Admin or Read Only etc.

For the MVP, I think admin users are sufficient, I just want to make sure the schema is flexible in the future.

3. Works properties

Just to make the schema more dynamic, I'm wondering if we should call the tmdb_id column external_id instead, in case we ever look at supporting other APIs for other types of works. As well, I'm just curious about the reasoning behind including release_year in the works table?

4. Awards

In a relational database paradigm, I'm not sure that the awards table is necessary. In the document database, the awardYear object was necessary as sort of a container, but I think in a relational database, we can ditch the awards table and instead include club_id and year on the awards_categories table. Then when we're querying for the categories in a given year, we can just index based on year

Let me know your thoughts on these suggestions.


Migration Strategy

Because all our data is based around the club, I'm wondering if it's possible to do an incremental migration starting with the club, then migrating each data type after that. Since there's not much for relationships between types below the club level, I don't think it would cause huge headaches, while lowering the risk of bugs as we'd be releasing smaller pieces of the migration at a time. I haven't thought it all the way through, however, so there could be a gotcha preventing us from moving incrementally.

cole-adams commented 7 months ago

On second thought, we will likely need a container Awards object to keep track of the current active step of the awards ceremony, so disregard point 4 in my previous response.