amir20 / dozzle

Realtime log viewer for docker containers.
https://dozzle.dev/
MIT License
5.74k stars 288 forks source link

🙋🏻Feedback needed: Docker Swarm support #2862

Closed Censseo closed 5 months ago

Censseo commented 5 months ago

Hello, just wanted to ask if you already thought about supporting docker swarm, as it can be multi nodes and needs something like a "agent" on each node. I saw https://github.com/amir20/dozzle/issues/2495 but it doesn't fulfill my need as I don't want to expose ports publicly just for that and I would need to change the conf each time a node join or leave the swarm, it's quite boring. I did that kind of "resolve" for replicated services and for what it matter, I just resolve the dns to get all the ips and then connect to each replicas. But I'm more a java dev than a go dev, so it would be a little tricky for me to change your code for that

amir20 commented 5 months ago

docker swarm, as it can be multi nodes and needs something like a "agent" on each node

Would it? Services use a mesh so I would just support docker service logs ... and Docker API should handle the magic. The problem is that API is very buggy and it has not worked for me always. Sometimes logs are missing and inconsistent.

I haven't really spent too much time on Swarm because I am not really sure where Docker is planning to go with Docker Swarm and not a lot of people have asked for it. I feel like k8s is winning here.

If I was going to support anything it would be services and stacks.

Censseo commented 5 months ago

well swarm is not around for many years like k8s, and is quite new so I guess there are still some bugs. I understand the dilemma of using docker service api to gather the log instead of deploying an agent on each node. If you feel this would be better I guess it could be a plus still. And swarm is only going to get more and more users as it grows. It is far less complicated than k8s for small and mid clusters and this is more logical to use it in these cases.

amir20 commented 5 months ago

@Censseo are you currently using docker service logs -f ...? I just tested it and it does look like it is working a lot better than a year ago. So I think I am going to give that a try. However, Dozzle only uses the Docker API. So hopefully people will understand that I don't actually control the logs from each container.

amir20 commented 5 months ago

One thing I noticed is that with docker service there is no way to get stats for each service. Docker API only allows mem/cpu to be fetched locally with containers and not through the service API.

amir20 commented 5 months ago

I have thought about this a little and I think it's more complicated than I had thought. Docker API only gives access to the logs, but stats is lost per container and can only be fetched when connected to the Docker host. I think there are few options:

  1. Only support logs with services and disable stats. In this mode, If Dozzle notices swarm is enabled then it also shows services under containers where it would use docker service logs. Stats wouldn't work.
  2. Support a hybrid approach where logs are fetched using service api but stats are fetched remotely if available through REMOTE_HOST config.
  3. Don't use service API at all and just group containers by service label. Fetching logs for multiple containers would mean Dozzle would have to stream multiple containers in parallel. This would deviate of how someone might expect Swarm mode to work.

I am not sure which makes sense. 1 is the easiest and the user would have to switch to the host and container to see more details.

Please provide feedback. If I don't hear anything then I can close because this is a lot of work if there isn't a lot of interest.

amir20 commented 5 months ago

Hmm it's quiet. I won't be implementing this. Just not enough interest for a lot of work.

Censseo commented 4 months ago

Hello @amir20 Sorry I didn't follow this, I was trying other solutions. Dozzle looks good but the lack of support for swarm is a no go on my side to implement it in my infra, so I just gave up for now and moved on in my research. So far I didn't find anything else interesting aside portainer. I have done few tests also on my side, and the docker service api is quite limited for extended details, and you would need to get an agent on each node of the swarm to fetch the datas of each container. It's indeed more work than just using the service api but it feels this is the way portainer went.

I have tried to play around with dockerode, a specific node for executing docker commands on nodered, and this way works. But I didn't had the time to go further.

amir20 commented 4 months ago

Hi @Censseo,

That's right, I think swarm API is unstable to rely on it. Using Swarm API doesn't provide stats neither which is an important part of debugging.

I think the right solution would be to provide an "agent" solution where it would run on all nodes and act as a "proxy" to stream all calls to UI. As you can imagine, this would be pretty complicated.

However, as noted, there doesn't seem to be a lot of interest on this from the community. I am not sure why that is. Maybe a lot of people are not using Swarm. K8s is more popular.

Given your use case, you might want to think about Datadog, Elasticsearch or other solutions. It would be too much support for an open source project like this to implement all the edge cases to handle Swarm.

Of course, if someone want to spend the time I am open to PRs. :)

amir20 commented 4 months ago

Some data points. In version 6.5.2 I added some metric to know how many people have Swarm enabled. About 7% have Swarm enabled when using Dozzle. That's a pretty small number.

Censseo commented 4 months ago

Yes I'm going to an ELK stack with fluend. Swarm may be a newer solution, but it's simpler for small infra with like a few dozen of machines. I think also that many ppl who use dozzle may not be the same than actually build infras. Using portainer templates to install stacks is more convenient for home labs on mono machines, Swarm infras generally don't rely on a tool like Portainer for deployments. This is maybe why u only see 7% of swarm activated. But it's just a guess.

amir20 commented 4 months ago

Hi @Censseo,

ELK stack sounds fine.

This is maybe why u only see 7% of swarm activated. But it's just a guess

I think so too.

I did a little research and I think I found a pretty good solution to stream logs across a swarm cluster. However, it still wouldn't provide search across logs. For the swarm cluster support, I posted on reddit. It would be a lot of work and I was wondering if a lot people want it, would they be willing to pay for it, and what other solutions do they use.

Personally, I'd love to add support for swarm and some kind of search but that would be such a huge scope that I am reluctant to do so.

Take a look at quickwit btw. It beats Elasticsearch on many features.

amir20 commented 4 months ago

Also TLDR from reddit seems to be:

I'd love to support more business use cases away from self hosting, but I think as a free option that would be difficult with my time.

Censseo commented 2 months ago

Take a look at quickwit btw. It beats Elasticsearch on many features. Thks for the advice I'll take a look of that!

I didn't had the time to implement ELK yet, as I had a lot of things to do aside before. I just saw your PR about swarm mode, amazing work! I'll try it on my infra :) Thks for this new amazing feature!