bebop / ark

Go REST API to replace Genbank, Uniprot, Rhea, and CHEMBL
MIT License
23 stars 6 forks source link

Determine RDMS #70

Open TimothyStiles opened 1 year ago

TimothyStiles commented 1 year ago

My initial thoughts are to just bite the bullet and use postgres with the AGE graph DB plugin.

We'll start simple and just do basic document (mongodb) style

My concerns are this.

How much data can we really throw into postgres? Realistically full DB could be as small as 5TB but if we start allowing entries it could get really, really, big. what are the solutions here?

Postgres with the AGE plugin should theoretically satisfy our needs for a database that can do documents, tables, and graphs, but if there's some fundamental limitation here what should we look towards next that can satisfy all three needs?

ORMs in Go have a tenuous history. With generics we'll have a little more flexibility with data models but we also want some strong-ish typing to prevent spaghetti. Ideally we'd want a tight integration between the Go structs we define with the models stored in Postgres and would be nice to avoid the map[string]interface pattern seen in a lot of Go web projects.

rkrishnasanka commented 1 year ago

Things we need to consider (mostly for the sake being rigorous):

  1. Datalog Compatible databases
  2. SPARQL compatible databases
  3. Implementing an SQL compiler (GraphJin) (maybe we can just use this)
rkrishnasanka commented 1 year ago

Regarding blob storage, I realized that one of the issues that might come along the way is that we might bloat up Postgres if we use it to store blobs. I use S3 or the equivalent blob storage systems from the other big cloud providers. Maybe we could store the blob URL in the postgres rather than all the data directly.

Koeng101 commented 1 year ago

Check out sqlc.dev

I’ve been using it lately and it’s fantastic. It takes your SQL schema and generates type-safe Go code. It non-trivially reduced the amount of bugs I get, while making my code more idiomatically Go. It does still require you to write SQL, however.