Elethiomel commented 7 years ago

Rocket.Chat Version: 0.57.2 Running Instances: 3 DB Replicaset OpLog: Enabled Node Version: v4.8.2

Since upgrading today from 0.57.2 from 0.56 I'm experiencing extremely slow file uploads. Files are taking minutes to upload instead of being nearly instantaneous.

The file seems to upload very slowly in 16k chunks to /tmp/ufs. I can see things like this in /tmp/ufs

-rw-r--r-- 1 root root 16384 Jul 18 17:51 72j4eMfBmzJFDeeCf

Then some time later....

-rw-r--r-- 1 root root 32768 Jul 18 17:55 72j4eMfBmzJFDeeCf

In order to make sure the backend being uploaded to wasn't affecting things, I moved from s3 to a local filestore (in reality a glusterfs filesystem shared between the cluster). It made no distance. When files eventually made it in full to /tmp/ufs they upload to go to s3 or local filestore in a second.

localguru commented 7 years ago

Have you tested it with only one running instance?

Elethiomel commented 7 years ago

@localguru : Great idea!

Moving to one instance gives me file uploads in a second or two. Moving back to three instances breaks uploads again.

Any idea what might have changed between 0.56.0 and 0.57.2 to cause this behaviour?

EDIT: Shutting down just one of the instances ( so I'm running with 2 instances) also gives me fast uploads. I'll reprovision the third node tomorrow and see if that helps.

Elethiomel commented 7 years ago

I spoke too soon. 2 instances causes slow speeds too. Perhap I didn't wait long enough for the second instance to join when I did my last test. After a minute or so it became as slow as three instances.

localguru commented 7 years ago

Seems to be a problem between the nodes then. I don't run multi nodes, just one, because there are too many open issues regarding multiple nodes, e.g. like notification problems.

localguru commented 7 years ago

may be similar to #7485

geekgonecrazy commented 7 years ago

So this is caused by #7304 basically if the http POST that uploads the file hits another instance then the one you are connected to.. it proxies the request to the instance you are connected to.

Elethiomel commented 7 years ago

@geekgonecrazy I could have sworn that I had stickiness enabled on my load balancer, but it turns out that I didn't. I'm guessing that enabling it would fix this? Or is it a case that rocketchat communicates through websockets for most things and the plain HTTP post is therefore a different session and could still end up elsewhere still?

geekgonecrazy commented 7 years ago

sticky sessions will prevent the proxy from needing to happen. Depending on your proxy the websocket and http post should be treated as same session

salamachinas commented 7 years ago

Version of Rocket.Chat Server: 0.57.3
Deployment Method: docker
Number of Running Instances: 4
DB Replicaset Oplog: Yes
Node Version: 4.8.2

Upload slow or don't working at all with more than one instance.

Error from logs

rocketchat_3   | Exception in callback of async function: Error: getaddrinfo ENOTFOUND undefined undefined:3000
rocketchat_3   |   at errnoException (dns.js:27:10)
rocketchat_3   |   at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:78:26)
rocketchat_3   |

geekgonecrazy commented 7 years ago

@salamachinas it sounds like your instances are not able to talk to each other. Are you using INSTANCE_IP env variable with your containers?

rasos commented 6 years ago

Very slow uploads are also observed with just one instance, see also #7857
Chunks of 100k arrive in a temporary file in /tmp/ufs .

ledainam commented 6 years ago

i have same issue with version 0.59.3 when i use multiple instance, upload from mobile app extreme slow and must choose file more than 2 times, upload can start. if i used only one instance, it worked fine.

jukie commented 6 years ago

@salamachinas what about if we're running in Docker Swarm or another environment where the containers are eventually rescheduled on another host?

kaji-bikash commented 6 years ago

version: '2'

services:
  rocketchat1:
    image: rocketchat/rocket.chat:0.64.2
    restart: unless-stopped
    volumes:
      - ../shared/uploads:/var/snap/rocketchat-server/common/uploads
    environment:
      - INSTANCE_IP=172.17.43.194
      - PORT=3333
      - ROOT_URL=https://rocket.example.com
      - MONGO_URL=mongodb://mongo1:27017/parties
    ports:
      - 3333:3333

  rocketchat2:
    image: rocketchat/rocket.chat:0.64.2
    restart: unless-stopped
    volumes:
      - ../shared/uploads:/var/snap/rocketchat-server/common/uploads
    environment:
      - INSTANCE_IP=172.17.43.194
      - PORT=4444
      - ROOT_URL=https://rocket.example.com
      - MONGO_URL=mongodb://mongo1:27017/parties
    ports:
      - 4444:4444

  rocketchat3:
    image: rocketchat/rocket.chat:0.64.2
    restart: unless-stopped
    volumes:
      - ../shared/uploads:/var/snap/rocketchat-server/common/uploads
    environment:
      - INSTANCE_IP=172.17.43.194
      - PORT=5555
      - ROOT_URL=https://rocket.example.com
      - MONGO_URL=mongodb://mongo1:27017/parties
    ports:
      - 5555:5555

We have 3000+ users with roughly 300-500 users online. All the rocket.chat instances are running in single server and the spec is 16 GB of RAM with 4 cores inside AWS VPC. We have nginx proxing to the rocketchat ports.

We have frequent problems with rocket.chat

not responding timely,
message not being sent,
frozen response,
uploads are not working.
.....very slow

We know replica-set would help but anything besides that we could be missing or points that might help our cases.

Much appreciated.

onigoetz commented 6 years ago

Hello @kajisaap I have a setup similar to yours.

One thing that might help you, at least for file uploads, is to replace nginx with HAProxy. As explained in this thread, when uploading a file, the file is proxied to the instance that first saw the user when connecting. As nginx can't support sticky sessions, your users are moved from one instance to another on each request.

Using haproxy with settings like

frontend ft_web
  bind 0.0.0.0:80
  default_backend bk_web

backend bk_web
  balance source
  hash-type consistent # optional
  server s1 192.168.10.11:80 check
  server s2 192.168.10.21:80 check

The "balance" setting here ensures that a single user will always be distributed to the same backend server. Also that the users will be distributed equally.

http://cbonte.github.io/haproxy-dconv/1.9/configuration.html#4-balance

This is not my actual configuration, I don't have access to it right now. I could post it on monday if you're interested.

By the way, it's easier to manage haproxy / nginx directly in the docker-compose.yml as well as you can refer to your instances by their names (rocketchat1) instead of IP and port, this way you can only expose a single port to the outside (and change the configuration/servers the way you like without re-configuring an external system every-time.

rasos commented 6 years ago

Did you try migrating the files storage to GridFS? Then the files are not stored in little chunks in the MongoDB. @arminfelder has been working on a migration script.

happy49sky commented 6 years ago

I found a solution. Just set load balance strategy to ip_hash ( Let a client requests always be send to the same node server )

geekgonecrazy commented 6 years ago

yeah basically right now sticky ip is the only way to do it :(

beranPro commented 6 years ago

Hi @onigoetz ,

This is not my actual configuration, I don't have access to it right now. I could post it on monday if you're interested.

By the way, it's easier to manage haproxy / nginx directly in the docker-compose.yml as well as you can refer to your instances by their names (rocketchat1) instead of IP and port, this way you can only expose a single port to the outside (and change the configuration/servers the way you like without re-configuring an external system every-time.

can you send me your configuration? I'm very interested in the docker-compose part.

thanks a lot!

onigoetz commented 6 years ago

Sure, here is a gist with the two configuration files : https://gist.github.com/onigoetz/9b49b4b3e713e35cd2e51f4b369ca2e7

The Rocket Chat version is a bit outdated, that's because we've since migrated our instance to a Kubernetes Cluster, but this configuration worked perfectly fine.

beranPro commented 6 years ago

Hi @onigoetz , is it working fine with kubernetes? Do you scale rocketchat automatically?

Thanks for the files!

ledainam commented 6 years ago

You can use ip_hash in nginx with multiple instance upload. I already test successfully. you should put : hash $remote_addr; under upstream config as below :

Upstreams

upstream backend { hash $remote_addr; server 127.0.0.1:3000; server 127.0.0.1:3001;

and put client_max_body_size 200M; under http to prevent error with iOs devices.

sgowie commented 5 years ago

Hi @onigoetz , is it working fine with kubernetes? Do you scale rocketchat automatically?

Thanks for the files!

For what its worth, using the kube-router CNI we've set the service for both the nginx TLS termination proxy, and the rocketchat application to use source-hashing. This works well, and allows for the pods behind the service to scale based on load.

wreiske commented 5 years ago

SlowFileUpload

Having this issue too. Refreshing and uploading the same image sometimes uploads instantly (much faster).

RocketChat / Rocket.Chat

File upload extremely slow with multiple instances #7524

Upstreams