WFCD / genesis

:robot: Warframe Discord Cephalon
https://genesis.warframestat.us
Apache License 2.0
93 stars 39 forks source link

Post-mortem: Genesis Notifications #441

Closed TobiTenno closed 3 years ago

TobiTenno commented 3 years ago

Burnout

So... solo-maintaining a highly-used, not-super-scalable project is hard. I've not played Warframe thoroughly in a long while (pretty much since the second Orb Mother was introduced) for a lot of reasons. As many of you know, I used to run a large Warframe server that used Genesis heavily. Lots of stuff happened, and I don't now. That affected my desire to play as well as my time availability for fixing stuff in Genesis.

Shortly before I was no longer running that server, I had been starting an overhaul on Genesis to separate the notifier and the standard command-interface bot (I refer to these as "genesis" and "the worker" from here on out).

As the months went on trying to figure out the issues, which I'll dig into more later, of why notifications were just simply not performing, I kept having times where I'd spend a weekend debugging the worker code to try to find some single point of failure. Reality being that I was also progressively not sure if I wanted to keep making everything work, what with impending Tencent acquisition of a majority of DE, as well as just personally not wanting to deal with this solo. Thankfully, I've got a lot of great friends that encouraged me and helped me focus on self-care before working on Genesis. This is probably the single most solid reason that the worker is now (marginally) working again.

Technical Difficulties

As one could imagine, genesis on the command interface started running super nicely after I stopped trying to have the same thread (yes, node.js is single-threaded, and as much as I'd tried to make it not, it still was) run the notifications for (at the time) 26k+ discord servers and who knows how many channels. So yeah, unifying the bot into a single instance, and letting the marvelous discord.js automatically shard was awesome for the bot side, but, well, the worker basically went back to infancy on performance.

Over several painful months, I dug through, optimizing queries with additional restrictions, causing among other things:

As anyone who's been following my commit history on this repo might've noticed, once I found and explored usage of the flat-cache package... a lot of my problems started exposing their true nature.

So... what was the root cause of the non-performance? Guilds and the database.

No, not discord's guilds. Their end is pretty swank, which is why it was not a huge problem when genesis and the worker were unified. However... they weren't anymore, and it turns out that fetching a 28MB object of guilds and their accompanying channels at least 16 times a second is... well, blatantly impossible. I've since introduced a caching/hydration system that occurs on a cron-specified time rotation; once-per-hour to get guilds/channel changes, once-per-10-minutes for large event queries... looking at you cetus times.

So, what does that mean for the worker's future? It's staying around, and I'm looking at how I can expand and fork processes for the worker.

The problem from here on is pretty much...

Funding

So... how does Genesis keep running (financially speaking)?

Well... tenuously at best. We're funded by the users, so if users don't fund it, it doesn't run. Right now, (thanks COVID), I'm at home all the time, so I'm currently running the worker on my home PC, allowing me to very easily watch performance compared to when it was always remote. That's been great for the worker, but the API (https://api.warframestat.us/ & https://docs.warframestat.us), database (heh, not giving you that address), and bot nodes are all powered and run on paid DigitalOcean droplets. At present, I spend ~$65/month (1x $40VPS, 1x $10, 1x $15) on those machines. It was higher (4x $20 VPS, 1x $10, 1x $15) when I was running the worker there as well, which I'll need & want to start doing again, albeit at a lower tier, as I now know I can run it at a much lower tier thanks to performance improvements.

So, why am I telling y'all this....

Well, I'd like to pad what I make so that if I ever need to increase machine capability or, heck, add something new to the apps we develop. I'm not ever going to demand patron support for core features on Genesis. I also have no plans to make any cool stuff pay-walled, as that just becomes a job, and Genesis is not and I don't want it to become a second job for me. I already have one and that's enough. I love working on Genesis, and so long as others want me to run this public instance and keep developing it. I will.

I'd love contributions to either patreon, Genesis's very own bitcoin wallet, or in another medium. I'm always looking to decrease my costs, so if you are an experience node.js developer and would like to lend your skills to making genesis run better, thus decreasing the necessity for the memory consumption that is currently necessary, I'd love to hear from you. If you'd like to help keep genesis alive in some other way, reach out to me on discord @ Tobiah#0001 or in one of the public channels on the Warframe Community Developers discord or Cephalon Sanctuary.

Questions about why this took so long? Ask below and I'll try to answer them in the comments and amend this!

syd-h commented 3 years ago

Much love for keeping a tool running and even continuing to improve it well after you have stopped playing. I'll have to toss a few $ again when the crew hops back into WF.

TobiTenno commented 3 years ago

I closed it, but please ask if you have questions!