Architecture - Githubissues

FFY00 / python-nest

Automated Python binary artifact building service

European Union Public License 1.2

1 stars 0 forks source link

Architecture #2

Open FFY00 opened 3 years ago

FFY00 commented 3 years ago

The architecture should be fairly straightforward, a webapp with a set of build nodes. The main consideration of what technologies to use is scalability, so we should choose components that have a distributed architecture. One of the other consideration points is licensing, we should go with FOSS offerings.

For the webapp framework, I think I want to go with Starlette.

To orchestrate the builds, I think probably the best option is kafka. We can use pydantic to serialize Python objects to JSON, providing a nice and natural interface.

For the database, we have a few options:

Apache Cassandra / ScyllaDB
- But has no ORM async
CockroachDB
- Licensing is not great
PostgreSQL + Citus
- I think it still needs one highly available coordinator node?

Going with an SQL option, makes things slightly easier for testing and might make things slightly easier for people rolling their own architecture which does not need to be scaled up.

FFY00 commented 3 years ago

The Citus docs asks us to contact them if we need multiple coordinators, it's a little bit weird. http://docs.citusdata.com/en/v10.0/admin_guide/cluster_management.html#adding-a-coordinator

FFY00 commented 3 years ago

Okay, I think the safest bet for the database is to go with standard SQL, because it's relational, has great tooling already, and has several possible server options (eg. PostgreSQL, CockroachDB, SQLite, etc).

FFY00 commented 3 years ago

Using plain kafka and pydantic to orchestrate the builds is not optimal, as that would require us to implement a task queue ourselves. I am leaning to https://github.com/faust-streaming/faust instead.

FFY00 commented 3 years ago

Faust is not a good fit for this, as we need multiple workers to share the load evenly, so I am going with celery. Hopefully, celery will gain support for a distributed broker in the future.

FFY00 commented 3 years ago

I have been thinking about this for a few months and ended up with this.

architecture-diagram

The webapp and builders communicate via webhook events. The builders might be a celery cluster, a custom app that triggers a Github actions workflow, etc. They will store the built wheels on AWS S3, or MinIO.

I am not sure if we should split the uploading to a separate component. The idea would be that the webapp would trigger the builders, and these would report the outcome to the uploader component. As the uploader needs the secret keys, we could have the webapp have write-only permission to the secrets database and only allow the uploader to read, isolating it.

FFY00 commented 3 years ago

Updated diagram.

architecture-diagram