Open Elethiomel opened 7 years ago
Have you tested it with only one running instance?
@localguru : Great idea!
Moving to one instance gives me file uploads in a second or two. Moving back to three instances breaks uploads again.
Any idea what might have changed between 0.56.0 and 0.57.2 to cause this behaviour?
EDIT: Shutting down just one of the instances ( so I'm running with 2 instances) also gives me fast uploads. I'll reprovision the third node tomorrow and see if that helps.
I spoke too soon. 2 instances causes slow speeds too. Perhap I didn't wait long enough for the second instance to join when I did my last test. After a minute or so it became as slow as three instances.
Seems to be a problem between the nodes then. I don't run multi nodes, just one, because there are too many open issues regarding multiple nodes, e.g. like notification problems.
may be similar to #7485
So this is caused by #7304 basically if the http POST that uploads the file hits another instance then the one you are connected to.. it proxies the request to the instance you are connected to.
@geekgonecrazy I could have sworn that I had stickiness enabled on my load balancer, but it turns out that I didn't. I'm guessing that enabling it would fix this? Or is it a case that rocketchat communicates through websockets for most things and the plain HTTP post is therefore a different session and could still end up elsewhere still?
sticky sessions will prevent the proxy from needing to happen. Depending on your proxy the websocket and http post should be treated as same session
Upload slow or don't working at all with more than one instance.
Error from logs
rocketchat_3 | Exception in callback of async function: Error: getaddrinfo ENOTFOUND undefined undefined:3000
rocketchat_3 | at errnoException (dns.js:27:10)
rocketchat_3 | at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:78:26)
rocketchat_3 |
@salamachinas it sounds like your instances are not able to talk to each other. Are you using INSTANCE_IP env variable with your containers?
Very slow uploads are also observed with just one instance, see also #7857
Chunks of 100k arrive in a temporary file in /tmp/ufs .
i have same issue with version 0.59.3 when i use multiple instance, upload from mobile app extreme slow and must choose file more than 2 times, upload can start. if i used only one instance, it worked fine.
@salamachinas what about if we're running in Docker Swarm or another environment where the containers are eventually rescheduled on another host?
version: '2'
services:
rocketchat1:
image: rocketchat/rocket.chat:0.64.2
restart: unless-stopped
volumes:
- ../shared/uploads:/var/snap/rocketchat-server/common/uploads
environment:
- INSTANCE_IP=172.17.43.194
- PORT=3333
- ROOT_URL=https://rocket.example.com
- MONGO_URL=mongodb://mongo1:27017/parties
ports:
- 3333:3333
rocketchat2:
image: rocketchat/rocket.chat:0.64.2
restart: unless-stopped
volumes:
- ../shared/uploads:/var/snap/rocketchat-server/common/uploads
environment:
- INSTANCE_IP=172.17.43.194
- PORT=4444
- ROOT_URL=https://rocket.example.com
- MONGO_URL=mongodb://mongo1:27017/parties
ports:
- 4444:4444
rocketchat3:
image: rocketchat/rocket.chat:0.64.2
restart: unless-stopped
volumes:
- ../shared/uploads:/var/snap/rocketchat-server/common/uploads
environment:
- INSTANCE_IP=172.17.43.194
- PORT=5555
- ROOT_URL=https://rocket.example.com
- MONGO_URL=mongodb://mongo1:27017/parties
ports:
- 5555:5555
We have 3000+ users with roughly 300-500 users online. All the rocket.chat instances are running in single server and the spec is 16 GB of RAM with 4 cores inside AWS VPC. We have nginx proxing to the rocketchat ports.
We have frequent problems with rocket.chat
We know replica-set would help but anything besides that we could be missing or points that might help our cases.
Much appreciated.
Hello @kajisaap I have a setup similar to yours.
One thing that might help you, at least for file uploads, is to replace nginx with HAProxy. As explained in this thread, when uploading a file, the file is proxied to the instance that first saw the user when connecting. As nginx can't support sticky sessions, your users are moved from one instance to another on each request.
Using haproxy with settings like
frontend ft_web
bind 0.0.0.0:80
default_backend bk_web
backend bk_web
balance source
hash-type consistent # optional
server s1 192.168.10.11:80 check
server s2 192.168.10.21:80 check
The "balance" setting here ensures that a single user will always be distributed to the same backend server. Also that the users will be distributed equally.
http://cbonte.github.io/haproxy-dconv/1.9/configuration.html#4-balance
This is not my actual configuration, I don't have access to it right now. I could post it on monday if you're interested.
By the way, it's easier to manage haproxy / nginx directly in the docker-compose.yml
as well as you can refer to your instances by their names (rocketchat1) instead of IP and port, this way you can only expose a single port to the outside (and change the configuration/servers the way you like without re-configuring an external system every-time.
Did you try migrating the files storage to GridFS? Then the files are not stored in little chunks in the MongoDB. @arminfelder has been working on a migration script.
I found a solution. Just set load balance strategy to ip_hash ( Let a client requests always be send to the same node server )
yeah basically right now sticky ip is the only way to do it :(
Hi @onigoetz ,
This is not my actual configuration, I don't have access to it right now. I could post it on monday if you're interested.
By the way, it's easier to manage haproxy / nginx directly in the
docker-compose.yml
as well as you can refer to your instances by their names (rocketchat1) instead of IP and port, this way you can only expose a single port to the outside (and change the configuration/servers the way you like without re-configuring an external system every-time.
can you send me your configuration? I'm very interested in the docker-compose part.
thanks a lot!
Sure, here is a gist with the two configuration files : https://gist.github.com/onigoetz/9b49b4b3e713e35cd2e51f4b369ca2e7
The Rocket Chat version is a bit outdated, that's because we've since migrated our instance to a Kubernetes Cluster, but this configuration worked perfectly fine.
Hi @onigoetz , is it working fine with kubernetes? Do you scale rocketchat automatically?
Thanks for the files!
You can use ip_hash in nginx with multiple instance upload. I already test successfully. you should put : hash $remote_addr; under upstream config as below :
upstream backend { hash $remote_addr; server 127.0.0.1:3000; server 127.0.0.1:3001;
}
and put client_max_body_size 200M; under http to prevent error with iOs devices.
Hi @onigoetz , is it working fine with kubernetes? Do you scale rocketchat automatically?
Thanks for the files!
For what its worth, using the kube-router CNI we've set the service for both the nginx TLS termination proxy, and the rocketchat application to use source-hashing. This works well, and allows for the pods behind the service to scale based on load.
Having this issue too. Refreshing and uploading the same image sometimes uploads instantly (much faster).
Since upgrading today from 0.57.2 from 0.56 I'm experiencing extremely slow file uploads. Files are taking minutes to upload instead of being nearly instantaneous.
The file seems to upload very slowly in 16k chunks to /tmp/ufs. I can see things like this in /tmp/ufs
-rw-r--r--
1 root root 16384 Jul 18 17:51 72j4eMfBmzJFDeeCfThen some time later....
-rw-r--r-- 1 root root 32768 Jul 18 17:55 72j4eMfBmzJFDeeCf
In order to make sure the backend being uploaded to wasn't affecting things, I moved from s3 to a local filestore (in reality a glusterfs filesystem shared between the cluster). It made no distance. When files eventually made it in full to /tmp/ufs they upload to go to s3 or local filestore in a second.