MikkelHJuul / ld

Lean Database
The Unlicense
22 stars 1 forks source link
database inserts lean-database shard

ld

Go Report Card PkgGoDev Maintainability codecov GitHub License FOSSA Status

Lean database - it's a simple-database, it's just an rpc-based server with the basic Get/Set/Delete operations. The database can ingest any value and store it at a key, it only supports rpc, via gRPC. The encoded binary message is stored, and served without touching the value on the server-side. To that end it is mostly a gRPC-cache, but I intend it to be a more general building block.

The database is operating on "key"-level only. If you need secondary indexes you needed to maintain two versions of the data or actually create the index (id'-id-mapping) table yourself. Some key-value databases offer more solutions than this; this does not, and will not, offering too many solutions most often lead to poorer solutions in general.

There is no Query language to clutter your code! I know, awesome, right?!

This project started out as a learning project for myself, to learn golang, rpc and gRPC.

The project is written in golang. It will be packaged as a scratch-container (linux amd64). I will not support other ways of downloading. As always you can simply go build

Docker images

images are mjuul/ld:<tag> and (alpine)mjuul/ld:<tag>-client. There is also a standalone client container mjuul/ld-client.

The container mjuul/ld:<tag> is just a scratch container with the Linux/amd64 image as entrypoint.

The container mjuul/ld:<tag>-client is based on the image mjuul/ld-client adding the binary ld to it and running that at startup. The client serves as an interactive shell for the database, see client.

Implementation

This project exposes badgerDB. You should be able to use the badgerDB CLI-tool on the database.

API

Hashmap Get-Set-Delete semantics! With bidirectional streaming rpc's. No lists, because aggregation of data should be kept at a minimum. The APIs for get and delete further implement unidirectional server-side streams for querying via KeyRange.

I consider the gRPC api to be feature complete. While the underlying implementation may change to enable better database configuration and/or usage of this code as a library. Maturity may also bring changes to the server implementation.

See test for a client implementations, the testing package builds on the data from DMI - Free data initiative (specifically the lightning data set), but can easily be changed to ingest other data, ingestion and read separated into two different clients.

Working with the API

The API is expandable. Because of how gRPC encoding works you can replace the bytes type value tag on the client side with whatever you want. This way you could use it to store dynamically typed objects using Any. Or you can save and query the database with a fixed or reflected type.

The test folder holds two small programs that implements a fixed type: my_message.proto.

The client uses reflection to serialize/deserialize json to a message given a .proto-file.

CRUD - why not CRUD?

CRUD operations must be implemented client side, use Get -> [decision] -> Set to implement create or update, the way you want to. fx

    Create      Get -> if {empty response} -> Set
    Update      Get/Delete -> if {non-empty} -> [map?] -> Set

To have done this server side would cause so much friction. All embeddable key-value databases, to my knowledge, implement Get-Set-Delete semantics, so whether you go with bolt/bbolt or badger you would always end up having this friction; so naturally you implement it without CRUD-semantics. Implementing a concurrent GetMany/SetMany ping-pong client-service feels a lot more elegant anyways.

Configuration

via flags or environment variables:

flag            ENV             default     description
------------------------------------------------------------------------------------
-port            PORT            5326        "5326" spells out "lean" in T9 keyboards
-db-location     DB_LOCATION     ld_badger   The folder location where badger stores its database-files
-in-mem          IN_MEM          false       save data in memory (or not) setting this to true ignores db-location.
-log-level       LOG_LEVEL       INFO        the logging level of the server

The container mjuul/ld:<tag>-client does not support flags for ld, use environment variables. (Since it is ld-client that is the entrypoint)

Comparison to ProfaneDB

ProfaneDB uses field options to find your object's key, and can ingest a list (repeated), your key can be composite, and you don't have to think about your key. (I envy the design a bit (it's shiny), but then again I don't feel like that is the best design).

ld forces you to design your key, and force single-object(no-aggregation/non-repeated) thinking.

ProfaneDB does not support any type of non-singular key queries; you will have to build query objects with very high knowledge of your keys (specific keys). This may force you to make fewer keys, and do more work in the client. (you may end up searching for a needle in a haystack, or completely loosing a key)

ld supports KeyRanges, you can then make very specific keys, and more of them, and think about the key-design, and query that via, From, To, Prefix and/or Pattern syntax.

ProfaneDB uses an inbuilt extension for its .proto. pro: you can use their .proto file as is. con: google's Any-type is just like map, and requires the implementer to send type-knowledge on each object on the wire.

ld use the underlying protocol buffers encoding design, con: this force the implementer to edit their .proto file, which is an anti-pattern. pro: while the database will not know anything about the value it saves, the type will be packed binary and can be serialised.

ld support bulk operations (via stream methods) natively. ProfaneDB via a repeated nested object, Memory-wise, streaming is preferred.

License

FOSSA Status

TODO