IreneKnapp / codex

A container for discussion and early exploratory work towards a new package repository for Haskell.
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

Description of server state #17

Open IreneKnapp opened 11 years ago

IreneKnapp commented 11 years ago

Okay, so the idea is that all state is divided into three parts: configuration, which describes how parts of the system can find each other; database, which is relational information about packages, users, and so on; and blob repository, which is a bunch of, er, blobby files.

I'd like the config files to use JSON (I am a fan of Aeson as the interface to it), because it's reasonably standard and doesn't have any really bizarre or surprising syntactic rules like YAML does. That also saves us time on writing a parser; the other obvious option would be a simple plaintext format that we define.

I'd like the database to use SQLite3, because I know and trust it, and it's trivial to set up and use and even back up and restore. Another sane option would be PostgreSQL, but that has substantially more administrative overhead.

I'd like the blob repository to live in Amazon S3. This makes distribution of files almost trivial, since we can simply grant public access to the appropriate parts of it. I've already poked around a bit and created a possible folder hierarchy we might use; see issue #16. The alternative to S3 would be the local, per-mirror filesystem, but this runs into size constraints, and means that each mirror, which doesn't really need to poke at the contents of packages except when it's in the act of building them, has to do a large up-front download before it can come online. It also introduces complications of synchronizing this state across federated mirrors. Now, S3 is not without its administrative hassles, but they relate to assigning permissions, which feels like a cleaner category of problem to have.

cartazio commented 11 years ago

on a point of order, for the near term I really really think we shouldn't have the primary mode of operation be AWS specific, I think theres a strong use case for making things easy to host locally. Also for just the "raw" package hosting, whats the storage needs, lets ask some folks!

cartazio commented 11 years ago

likewise, asked Luite (cause he's got that hdiff.luite.com site), The compressed archives only take like 2gb of space.

decompressing all the archives, + git versioning + stuff (including the compressed archives) is about 8.9gb and 809,014 files.

so the storage needs are pretty easy / not much

IreneKnapp commented 11 years ago

I'm not concerned about storage needs so much as disk thrashing when everybody tries to hit the service at once. Plus it becomes something annoying to keep in synch. But yes, I take your point about scope; let's by all means make the first iteration not use AWS.

cartazio commented 11 years ago

for a public hackage mirror, might be worth looking into AWS for making things durable, otoh, the entirety of hackage should be easy to have local / private / private + private libs addition and that should be easy for people