Make high-level architectural decisions

alex0112 commented 2 years ago

This issue started as a think-through of the interface for generic repositories within our app, but kind of grew from there. I apologize in advance for the moderate brain dump. I'm hoping to treat this as the basis for a discussion so that we can keep track of it in one place outside of random matrix threads.

High Level Thoughts

We'll need abstractions representing

A User (plus Authz & Authn)
A Generic Repo (regardless of its source)
Specific behaviours for repos that take into account the source of the repository
Some way to represent specific actions unique to different repository sources

Some considerations to keep in mind:

Not all generic repos are going to be created equal. Some (like gitlab and github) will have actions such as starring and forking that should have a consistent interface regardless of source even if the implementation is obviously different. However other repos (e.g. sourcehut) will inherently lack that functionality. There are a couple ways we could approach that, but let's have a phone call some time and hash it out
It's tempting to name a repo something like Repo, but ecto also uses that word by default when generating some modules so we should probably be wary of that. We could just call the module Repository when talking generically or GitRepo or Source or whatever (JS). (I'm going to use Repository for the remainder of this issue, but I don't love that name because it's cumbersome and the pluralization gets annoying)

Relationships

(just a sketch of some basic relationships relevant to the schema, this is a living list feel free to edit as time goes on)

User => has_many [Repositories, Third Party Accounts]

A generic Repository interface

(once again, this is a living list and I expect additions and subtractions here) Just thinking of /Git(Hu|La)b/ to start off with:

Star
Fork
Tag
Read a feed of some sort (get other stars, see open issues comments, open/closed etc.)
Clone/get clone link
Create and open a fork on the relevant platform

Views

Login Page / Create Account
Home page (for logged in user)
- I imagine the content here is a list of starred repos, perhaps imported from a linked github/gitlab account
Activity Feed (From GItlab/Github) so that there are actually repos to star

alex0112 commented 2 years ago

Looks like Tentacat or Github may be our best bet for a GitHub client library

The state of GitLab clients is not great. We will may have to build the needed functionality from scratch.

ashton314 commented 2 years ago

Fun and short synonyms for "repository":

depot
store
bank
mine

The idea of a "code bank" is kinda fun. "Depot" might be a good name that's not "repo" (sounds like it when you pronounce it.)

ashton314 commented 2 years ago

I think we should move forward with "Depot", which hold the following concepts:

URL of the repository in question
limited metadata about it (e.g. where it's hosted, who owns it, where the README is located, etc.)

Then, we should also have some other thing (idk, "depot snapshot"?) that is a representation of the repository's name, description, and contents of the README (maybe the Wiki too?) that we update periodically and then allow users to do a full-text search of. (Postgres has some nice built-in tools for this.)

Then we just have join tables relating users to the repositories they've stared.

By keeping the snapshots in a single table and de-duplicated, we avoid some risk of pummeling the GitHub API when doing searches, etc.

alex0112 commented 2 years ago

I think we should move forward with "Depot", which hold the following concepts:

URL of the repository in question

limited metadata about it (e.g. where it's hosted, who owns it, where the README is located, etc.)

I'm fine with this approach. That metadata is going to need to include a method by which that repository can be starred on the third party service. We've had a couple discussions about this but I think it would be good to have an approach written down before we start building this particular solution, since that's something we've gone back and forth on a bit.

Then we just have join tables relating users to the repositories they've starred By keeping the snapshots in a single table and de-duplicated, we avoid some risk of pummeling the GitHub API when doing searches, etc. ...that we update periodically...

It's a good approach. Depending on what data we store and what we wish to display to the user will likely affect the implementation of this. If it's just the repo name and url I don't forsee any problems doing that all on the back end. However, if more data needs to be displayed that might change with mild frequency (e.g. the readme body or the wiki as you point out here) then in order to avoid a lot of traffic to the third party api from our servers we could probably push that off to the client.

Currently I can't think of anything like that requiring authentication for the user, but if it ever does we might want to consider having the client make the request. Just a thought. I'm concerned that we'll get lost in the weeds trying to make sure that content has not changed from a depot source, when it might just be simpler to keep a bookmark of where the repo is and tag it with some metadata or something for searching purposes. Then we could let the browser fetch the current state of the repo data without going through our servers.

Great thoughts here, thanks for writing them up.

alex0112 commented 2 years ago

Another thought on info validation: We will likely have access to the current state of the repository via the latest commit checksum. We could store that and use it for a comparison of the repo and only re-fetch if they do not match (assuming we do want to handle it server-side for some reason)

Wouldn't work for something like the wiki though.

SAAF-Alex-and-Ashton-Forces / kepler