AppFlowy-IO / AppFlowy-Cloud

AppFlowy is an open-source alternative to Notion. You are in charge of your data and customizations. Built with Flutter and Rust.
GNU Affero General Public License v3.0
1.05k stars 219 forks source link

[FR] Allow to disable (Open)AI integration and provide context about the Appflowy (Open)AI dependency #565

Open almereyda opened 5 months ago

almereyda commented 5 months ago

Posting this here; would have preferred a discussion, as this seems intentional and probably not considered a fault. #564

1~3 main use cases of the proposed feature

  1. As an AppFlowy Cloud self-hoster, I want to be able to run the whole collaboration stack with free software, in order to retain full data autonomy.
  2. As an AppFlowy Cloud self-hoster, I want to be able to voluntarily disable AI data processing features, in order to use a less resourceful editing and publishing environment.

what types of users can benefit from using your proposed feature

Additional context

It's not possible to run the AppFlowy-Cloud stack without providing an OPENAI_API_KEY for the ai container, else it enters a crash loop.

$ docker compose up
…
ai-1                |   Did not find openai_api_key, please add an environment variable `OPENAI_API_KEY` which contains it, or pass `openai_api_key` as a named parameter. (type=value_error) ai-1 exited with code 1

This is referenced from nginx.conf, which in returns lets the nginx container enter a crash loop when accessing the configured system.

nginx-1  | 2024/05/19 15:03:54 [emerg] 1#1: host not found in upstream "ai" in /etc/nginx/nginx.conf:97
nginx-1  | nginx: [emerg] host not found in upstream "ai" in /etc/nginx/nginx.conf:97

There is no build: section for the ai container in the Compose manifest:

https://github.com/AppFlowy-IO/AppFlowy-Cloud/blob/ec7eb54bfca1f43a047cb65406402b9481ed9985/docker-compose.yml#L133-L141

There is a container published to

which remains proprietary. No sources can be obtained, which is surprising, given the existing code base is released with one of the most restrictive FLOSS licenses, AGPL.

There is no mention of it in the Cloud or Self-hosting documentation:

But the general documentation for regular AppFlowy knows about an OpenAI integration:

There does not appear to be further documentation for the ai container and what it does. There are some further AI related developments in the GitHub organisation, which appear to be related concerns:

Where is AppFlowy going with regards to AI and its cloud self-hosting?

Like this it's impossible to replicate the AppFlowy stack without external dependencies.

speed2exe commented 5 months ago

@almereyda Thanks for starting this thread and highlighting this issue.

We will make it possible to be able to run the AppFlowy-Cloud Stack without the ai services. If you face any crash loop in the nginx, you can remove the ai configs, it should ran as per normal.

The general documentation with regards to AI in AppFlowy is using the OpenAI services (not the one in the deployment configs). The deployed AI services is an optional add-on.

This AI development is quite recent and the documentation are not catched up yet.

almereyda commented 4 months ago

Thank you for getting back to this. I'm already witnessing, that the platform is moving fast and that the documentation does not necessarily reflect the actually available features in the code.

Please note that the Nginx container does not end up in a restart loop anymore, even with merely stopping the ai container.

Meanwhile I'm using the

    profiles:
      - donotstart

without removing the container.

Now it seems even when that container is disabled, the appflowy_indexer will constantly try to exfiltrate one's data to the OpenAI API, too. We see frequent repetitions of these events in the logs:

{"timestamp":"2024-06-13T21:05:43.499863Z","level":"INFO","fields":{"message":"updating indexes for 1 fragments"},"target":"appflowy_indexer::collab_handle"}
{"timestamp":"2024-06-13T21:05:43.678613Z","level":"ERROR","fields":{"message":"document 69adfb20-b4a1-493e-acd5-d7bc63e674c4/9dd9afce-ac8c-4ec0-95d1-4beca38171f3 watcher failed to publish fragment updates: OpenAI failed to process request: {\"code\":null,\"message\":\"You didn't provide an API key. You need to provide your API key in an Authorization header using Bearer auth (i.e. Authorization: Bearer YOUR_KEY), or as the password field (with blank username) if you're accessing the API from your browser and are prompted for a username and password. You can obtain an API key from https://platform.openai.com/account/api-keys.\",\"param\":null,\"type\":\"invalid_request_error\"}"},"target":"appflowy_indexer::collab_handle"}

This can be avoided with the same technique from above.

Note that when stopping this container, it takes ten seconds to SIGKILL it, as the process ran by the container doesn't respond to SIGTERM. Putting init: true in its Compose manifest will help to workaround this.

diff --git a/docker-compose.yml b/docker-compose.yml
index 5ec8121..2c95d43 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -138,6 +144,8 @@ services:
       - OPENAI_API_KEY=${APPFLOWY_AI_OPENAI_API_KEY}
       - APPFLOWY_AI_SERVER_PORT=${APPFLOWY_AI_SERVER_PORT}
       - APPFLOWY_AI_DATABASE_URL=${APPFLOWY_AI_DATABASE_URL}
+    profiles:
+      - donotstart

   appflowy_history:
     restart: on-failure
@@ -152,6 +160,7 @@ services:
       - APPFLOWY_HISTORY_DATABASE_URL=${APPFLOWY_HISTORY_DATABASE_URL}

   appflowy_indexer:
+    init: true
     restart: on-failure
     image: appflowyinc/appflowy_indexer:${APPFLOWY_INDEXER_VERSION:-latest}
     build:
@@ -163,6 +172,8 @@ services:
       - APPFLOWY_INDEXER_ENVIRONMENT=production
       - APPFLOWY_INDEXER_DATABASE_URL=${APPFLOWY_INDEXER_DATABASE_URL}
       - APPFLOWY_INDEXER_OPENAI_API_KEY=${APPFLOWY_INDEXER_OPENAI_API_KEY}
+    profiles:
+      - donotstart

 volumes:
   postgres_data:

The vector indexer was recently introduced with:

When not using it, it's also not necessary to build a custom Postgres cluster with the pgvector extension.

pgvector is required by a migration:

https://github.com/AppFlowy-IO/AppFlowy-Cloud/blob/430e3e15c9a1dc6aba2a9599d17d946a61ac7cae/migrations/20240521092310_collab_embeddings.sql#L2

Maybe there is a way to build the application, that it also doesn't depend on a vector database.

Related:

Edit: Further it appears useful to consider disabling all AI features in the application, when a server (like in the #622 example) doesn't offer it:

grafik

Especially when no API key is given:

grafik

speed2exe commented 4 months ago

@almereyda The ai feature in the sreenshot has nothing to do with the the ai services and indexer (those are WIP and offically unreleased), the ai features in the screenshot requires you to put OpenAI keys in the frontend settings.

For now, if you are running postgresql without pgvector and not running the indexer, you can simply remove file AppFlowy-Cloud/migrations/20240521092310_collab_embeddings.sql.

almereyda commented 4 months ago

There's so much AI around, without actually using it, I might have mixed things up.

Thanks for pointing out, that the migrations are used for all three stateful services, appflowy_cloud, appflowy_history and appflowy_indexer. Maybe it's possible to separate them a bit more, in so we can more cleanly distinguish what belongs where.

henri9813 commented 2 months ago

Hello,

In my opinion, there is too many steps required to simply run appflowy without AI.

I know in 2024 no one is able to publish a product without AI even if many people don't care about it.

In my case, I deploy appflowy for documentation purposes, and project management.

But I can't imagine due to AI ( and OpenAI ) we can't run a simple self-hosted project.

Maybe there is a design problem somewhere, but I can't imagine a world when not configuring OpenAI token in a self hosted project which is not entirely dependent to AI block his deployment.

Thanks @speed2exe for the workaround which was not required few months ago.

We need a support for a simply not enabled openai.

The container should either not start, or gracefully stop by printing: "AI features will not be loaded because OPENAI_TOKEN environment variable is not configured."