ReVanced / revanced-helper

🤖 NLP backed bots assisting ReVanced
https://revanced.app
GNU General Public License v3.0
20 stars 8 forks source link

ci: init #3

Open oSumAtrIX opened 1 year ago

oSumAtrIX commented 1 year ago

Issue

Currently, there is no CI/CD.

Solution

Implement CI:

Additional context

alexandreteles commented 1 year ago

Here is a list of considerations:

  1. Please write a README.md explaining how to set up a dev environment, how to build and how to run a production environment (are there any required environment variables? is there a configuration file, etc.);

  2. Explain in which order the services and their dependencies should be initialized;

  3. Please list the versions of the external packages required by the application (tesseract, etc.);

  4. Make the deployment and API keys available to the @revanced/backend team.

As for the NodeJS modules depending on each other, could you either decouple them or publish the packages and update the code accordingly? We expect each bot to run in its container in a stack with the server they communicate with. However, having all of them in a single repository means that building and publishing docker images from GH actions become trickier. Ideally, we would like to give each bot or library its repository, and we'll be able to easily use webhooks for automatic deployment.

EDIT: Let me bring @reisxd in this conversation.

oSumAtrIX commented 1 year ago

@reisxd

alexandreteles commented 1 year ago

The client library uses BSON over TCP in a custom protocol to communicate with the server. I would like us to adopt an industry-standard protocol like JSON-RPC or gRPC, but if we are going to write our protocol, it would be better from a deployment standpoint to have the protocol rewritten to use UDP and some compression (https://www.npmjs.com/package/brotli).

The current proposed solution is to build an architecture like this:

revanced_helper

I see two issues here:

  1. Long queues can build while a single server awaits a response from external services;
  2. Harder to scale if we have to duplicate the whole stack.

So I suggest we go for a stack that looks more like this:

revanced_helper

I will write a mockup Compose stack when the time is at hand so you can give me input.

oSumAtrIX commented 1 year ago

It is indeed possible to spin up a stack of servers. The question is if wit.ai allows us to have three wit.ai clients.

I would like us to adopt an industry-standard protocol

What benefit will it give to us? BSON is pretty much available in any language and is as practical as JSON.

Sculas commented 1 year ago

Multiple clients is fine since it's just good ol' HTTP AFAIK. Rate limit might become a problem though.

Sculas commented 1 year ago

Also, I agree with moving to gRPC. It's made for situations like these and is also widely used for (internal) communication between (micro)services.

reisxd commented 1 year ago

Undici (the internal dependancy NodeJS uses for native fetch) is already fast enough, so I don't we should worry about the server not being able to handle requests.

One more thing I should mention is the communities aren't that active to even get us ratelimited from wit.ai. I don't think we will ever reach a point where Node can't handle too many requests.

alexandreteles commented 1 year ago

It is indeed possible to spin up a stack of servers. The question is if wit.ai allows us to have three wit.ai clients.

The API only rate-limits us, it does not complain about multiple connections using the same IP.

What benefit will it give to us? BSON is pretty much available in any language and is as practical as JSON.

Wider support from other libraries/tools, less development overhead (we do not have to support our protocol), and consequently, smaller technical debt.

Multiple clients is fine since it's just good ol' HTTP AFAIK. Rate limit might become a problem though.

wit.ai rate limits are per minute, so sending more requests per second shouldn't affect us too much as long as we keep things to a sane level (the server should have an internal queue but that's not immediately necessary).

Also, I agree with moving to gRPC. It's made for situations like these and is also widely used for (internal) communication between (micro)services.

Should have mentioned that as well, thank you.

Undici (the internal dependancy NodeJS uses for native fetch) is already fast enough, so I don't we should worry about the server not being able to handle requests.

One more thing I should mention is the communities aren't that active to even get us ratelimited from wit.ai. I don't think we will ever reach a point where Node can't handle too many requests.

The issue isn't the server being unable to handle requests, it's something else. Node is single-threaded, meaning all requests will constantly hit a single CPU thread, boggling the whole thread in case it gets overwhelmed. That means that any other tasks with affinity to that thread suddenly await long semaphores, slowing down the whole server.

If we run multiple servers with a load balancer in front of them, we can make sure that the load is evenly spread amongst our system cores, lowering the impact of one of the servers going belly up (not to mention that we can guarantee that the service keeps running if the server somehow dies).

From a performance perspective, it simply allows us to process more requests if necessary while also making it easier to scale/migrate to other servers.