Melvillian / habilis

tools for thought
Other
1 stars 0 forks source link

Discussion: Initial Architecture #1

Closed wclausen closed 6 months ago

wclausen commented 6 months ago

There's a number of things for us to do to get this project off the ground. One of those things is decide on an initial technical architecture for what we want to build.

At this stage of the project, I'd focus on the following things:

These are my personal default answers these days:

Not married to any of these, but they would be easy for me to set up and be productive fairly quickly. They are my vote, and I do feel like they solve our current problems quite well.

Alternatives that I think are worth considering:

Of course, there's other stuff to think about (e.g. we'll probably want a sane background job queue given our plans for lots of LLM processing tasks), but I think these are the right ones to focus on for now.

wclausen commented 6 months ago

Oh I forgot a thing.

For the backend, I'd also be very down to use Kotlin (ktor is the web api framework, but we could presumably use Spring). The more time I've spent in Python, the more I find myself yearning for static typing and various kotlin niceties (data classes, sealed classes, basic FP functions you can chain like map/filter). I like Python enough to keep using it, but it's becoming unpleasant. With that said, when it comes to disliking Python, there's a good chance I'm mostly "holding it wrong."

Melvillian commented 6 months ago

Database: Postgres/Redis Backend: Python (FastAPI, Sqlalchemy as main pieces) Frontend: Svelte Deployment: Hetzner/Docker/Kamal

I agree, let's use all of these.

Oh I forgot a thing.

I do not want to use Kotlin; I would have to learn it, it's a little niche, and...

the more I find myself yearning for static typing and various kotlin niceties (data classes, sealed classes, basic FP functions you can chain like map/filter)

Python has all of these things and does them well. I'll show you how to do all of them in Python, or you can read this book: https://www.amazon.com/Fluent-Python-Concise-Effective-Programming/dp/1491946008

For the job queue, I have written my own in Mongo and it was a nightmare so let's not do that. If we went with Redis for our MQ then from what I can tell we'd still have to implement things like acknowledgements and idempotency guarantees. Like I said that is a pain to try to do, let's choose something that already does all that: RabbitMQ.

wclausen commented 6 months ago

For the job queue, I have written my own in Mongo and it was a nightmare

Yes, definitely don't write our own. I've now used Celery in Python and it seems ok (though, it has a reputation for being annoying to work with in some general sense). Dramatiq or RQ also seem fine.

Python has all of these things and does them well.

Surprising that you would feel this way without knowing much Kotlin. Having used both, Python has a much weaker version of Kotlin's data class (I'm familiar with Pydantic and @dataclass as the standard comparables here in Python), it has no concept that maps cleanly to a sealed class, it's approach to static typing is also much weaker (mypy is good for what it is, and it is also a noticeable step down from a real compiler), and it's fp primitives are similarly "just fine" but Kotlin's approach is a clear winner. I'd go along with "you can do these things in Python, kind of," but not "It's got the same stuff and works just as well." I will caveat here that I wouldn't describe myself as a Python "expert", and I'd love to see the optimal solutions to these problems if you think they differ from what I've described. I have actually tried to find them, and have so far come away thoroughly unimpressed.

With that said, Python is great in many ways. These are areas where Python famously lacks polish, though they've only become prominent in recent years when new languages like Kotlin have pushed the envelope of language design.

Oh, another important (and relatively common) pain point with Python is the concurrency story. The concurrency support in Kotlin (and many other languages) is significantly better than what Python achieves (referring to any/all of multiprocessing, concurrenct.futures.ThreadPoolExecutor, and async/await here). As an example of Python's shortcomings, you'll note in Python that openai provides two entirely different classes if you want to use their APIs synchronously vs plugging into Python's async/await. To call an async function, you need an entirely different class? To use async/await I need to pull in a 3rd party lib? Wild. And extremely unfortunate when coming from other languages that provide a more seamless concurrency experience (for reference Kotlin offers coroutines, threads at a basic level if you wanted, and then there's also RxJava if streams fit your data model well).

Anyways, we will work around Python's shortcomings here. Kotlin isn't one of the more challenging languages to learn, but it's fair to decide to pick it up later, if at all.

We could also try/think about Go! I have limited experience with it, but it seems well-liked/regarded for the kind of program/system we are likely to build.

Melvillian commented 6 months ago

Agreed, I updated #4 to use celery.