SAAF-Alex-and-Ashton-Forces / kepler

We are all in the gutter, but some of us are looking at the stars.
MIT License
1 stars 0 forks source link

Make high-level architectural decisions #5

Open alex0112 opened 2 years ago

alex0112 commented 2 years ago

This issue started as a think-through of the interface for generic repositories within our app, but kind of grew from there. I apologize in advance for the moderate brain dump. I'm hoping to treat this as the basis for a discussion so that we can keep track of it in one place outside of random matrix threads.

High Level Thoughts

We'll need abstractions representing

Some considerations to keep in mind:

Relationships

(just a sketch of some basic relationships relevant to the schema, this is a living list feel free to edit as time goes on)

A generic Repository interface

(once again, this is a living list and I expect additions and subtractions here) Just thinking of /Git(Hu|La)b/ to start off with:

Views

alex0112 commented 2 years ago

Looks like Tentacat or Github may be our best bet for a GitHub client library

The state of GitLab clients is not great. We will may have to build the needed functionality from scratch.

ashton314 commented 2 years ago

Fun and short synonyms for "repository":

The idea of a "code bank" is kinda fun. "Depot" might be a good name that's not "repo" (sounds like it when you pronounce it.)

ashton314 commented 1 year ago

I think we should move forward with "Depot", which hold the following concepts:

Then, we should also have some other thing (idk, "depot snapshot"?) that is a representation of the repository's name, description, and contents of the README (maybe the Wiki too?) that we update periodically and then allow users to do a full-text search of. (Postgres has some nice built-in tools for this.)

Then we just have join tables relating users to the repositories they've stared.

By keeping the snapshots in a single table and de-duplicated, we avoid some risk of pummeling the GitHub API when doing searches, etc.

alex0112 commented 1 year ago

I think we should move forward with "Depot", which hold the following concepts:

  • URL of the repository in question
  • limited metadata about it (e.g. where it's hosted, who owns it, where the README is located, etc.)

I'm fine with this approach. That metadata is going to need to include a method by which that repository can be starred on the third party service. We've had a couple discussions about this but I think it would be good to have an approach written down before we start building this particular solution, since that's something we've gone back and forth on a bit.

Then we just have join tables relating users to the repositories they've starred By keeping the snapshots in a single table and de-duplicated, we avoid some risk of pummeling the GitHub API when doing searches, etc. ...that we update periodically...

It's a good approach. Depending on what data we store and what we wish to display to the user will likely affect the implementation of this. If it's just the repo name and url I don't forsee any problems doing that all on the back end. However, if more data needs to be displayed that might change with mild frequency (e.g. the readme body or the wiki as you point out here) then in order to avoid a lot of traffic to the third party api from our servers we could probably push that off to the client.

Currently I can't think of anything like that requiring authentication for the user, but if it ever does we might want to consider having the client make the request. Just a thought. I'm concerned that we'll get lost in the weeds trying to make sure that content has not changed from a depot source, when it might just be simpler to keep a bookmark of where the repo is and tag it with some metadata or something for searching purposes. Then we could let the browser fetch the current state of the repo data without going through our servers.

Great thoughts here, thanks for writing them up.

alex0112 commented 1 year ago

Another thought on info validation: We will likely have access to the current state of the repository via the latest commit checksum. We could store that and use it for a comparison of the repo and only re-fetch if they do not match (assuming we do want to handle it server-side for some reason)

Wouldn't work for something like the wiki though.