Multi-Instance file upload broken when using ufs(gridfs/filesystem)

geekgonecrazy commented 7 years ago

Rocket.Chat Version: 0.55.1 up to current 0.56.0-rc.2 Running Instances: 2 or more DB Replicaset OpLog: Yes

If you are running more then one instance and do a file upload ufs makes the request over ddp gets the address to post to. Then the browser posts to the url. But if the instance it hits is not the same one you are talking to via ddp... the file upload fails.

This seems to have something to do with ufs creating a file in /tmp/ufs/{id} the id being the same that is in the file url. If you post to a different instance.. then of course that file does not exist. So it fails giving an error like: no such file /tmp/ufs/{id}

The workaround... is if you happen to be using a single node to run the instances... you just mount the same folder in docker, or if on disk you're fine. However for multi-nodes/multi-instance... this obviously will not work.

oliver-bowman commented 7 years ago

Is there an eta on a fix on this as Its affecting quite a few of my colleagues. I am guessing that switching to uploading to an s3 bucket would also be broken when using a mult instance setup?

geekgonecrazy commented 7 years ago

@oliver-bowman this only effects filesystem and gridfs as these are the only two that use ufs for file upload.

Something like s3 or minio would get around this issue

freach commented 7 years ago

I am facing the same issue.

Rocket.Chat: 0.55.1
Instances: 3
Servers: 3
OpLog: Yes
Mongo: 3.2.13
Docker:

Setup is as follows. 2 NGINX loadbalancers with floating IP, 3 servers with 1 Rocket.Chat instance each running on Docker. /tmp/ufs is a mounted volume via a GlusterFS volume. So all instances across the 3 servers can access the same /tmp/ufs. Especially big uploads fail almost all the time with an exception like

Exception while invoking method 'ufsComplete' Error: FileNotFound: file YhrK9iyvEAxPg27iA was not found
  at Object.Future.wait (/app/bundle/programs/server/node_modules/fibers/future.js:449:15)
  at [object Object].Meteor.methods.ufsComplete (/app/bundle/programs/server/packages/jalik_ufs.js:827:24)
  at [object Object].methodsMap.(anonymous function) (/app/bundle/programs/server/packages/rocketchat_lib.js:1071:26)
  at [object Object].methodMap.(anonymous function) (packages/rocketchat_monitoring.js:2731:30)
  at maybeAuditArgumentChecks (/app/bundle/programs/server/packages/ddp-server.js:1823:12)
  at /app/bundle/programs/server/packages/ddp-server.js:905:20
  at [object Object]._.extend.withValue (packages/meteor.js:1122:17)
  at /app/bundle/programs/server/packages/ddp-server.js:904:41
  at [object Object]._.extend.withValue (packages/meteor.js:1122:17)
  at /app/bundle/programs/server/packages/ddp-server.js:903:46
  at [object Object]._.extend.protocol_handlers.method (/app/bundle/programs/server/packages/ddp-server.js:876:21)
  at /app/bundle/programs/server/packages/ddp-server.js:755:85

    - - - - -
  at /app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/gridfs-stream/download.js:281:17
  at handleCallback (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/utils.js:95:56)
  at /app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/collection.js:1406:5
  at handleCallback (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/utils.js:95:56)
  at /app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/cursor.js:674:5
  at handleCallback (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/cursor.js:171:5)
  at setCursorNotified (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/cursor.js:515:3)
  at /app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/cursor.js:590:16
  at queryCallback (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/cursor.js:232:18)
  at /app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/connection/pool.js:455:18
  at nextTickCallbackWith0Args (node.js:489:9)
  at process._tickDomainCallback (node.js:459:13)

I figure due to the loadbalancing the upload completes on a different instance then it intially started. Even though the missing file, in my example "YhrK9iyvEAxPg27iA", is not missing at all. I see it in my GlusterFS folder and all containers do see the file in /tmp/ufs. So maybe it's trying to access the file somewhere else then /tmp/ufs/YhrK9iyvEAxPg27iA.

geekgonecrazy commented 7 years ago

It's very possible that it's not synced fast enough. If it's a millisecond too slow in syncing that's all it'd take.

The solution to this is probably just going to be to have it create the file on the other instance if it doesn't exist instead of erroring out

freach commented 7 years ago

I introduced the cluster filesystem after experiencing the same issues @geekgonecrazy described. So, I think with small uploads, like few bytes the cluster latency could be an issue and the reason why a file is not available for all instances, but with big uploads, like a few MB the latency shouldn't matter. The file should be found. What could be an issue even with big files is that maybe the last chunk is missing and therefor the upload corrupt, but the file should be found anyways. So I don't fully support the theory that it's entirely up to cluster latency, or filesystem latency.

freach commented 7 years ago

@oliver-bowman I don't think S3 is affected because it's not using the UFS middleware. The client is directly uploading to S3.

geekgonecrazy commented 7 years ago

@freach I'm not talking about the latency of the file it's self.

When you do an upload the client asks for a URL via websocket.
The instance you are talking to creates a file at /tmp/ufs/{Id} (same Id as in the url it returns)
Instance shoots this url back to client.
Client does a POST to the url provided
Instance checks for the tmp file
If exists it starts filling up the file
Copies to Gridfs or the final file system location

If client hits a different instance and the placeholder file in /tmp/ufs has not been created on the node it hits.. then you get the error above. The whole process above happens very quickly, if that tmp file has not been synced.. then it fails. By time you check it's likely then finished syncing that file in /tmp/ufs/{I'd}

Another solution to this is to add session stickiness so the client will simply talk to the same instance.

freach commented 7 years ago

@geekgonecrazy Got it. Here is what I see, when I reproduce the issue. To reproduce the issue I simply start an upload, in my case 8094732 byte zip file. After about 75% upload I restart 2 of the 3 instances, upload continues reaches 100% and returns an "Internal server error" in the web client.

On my not restarted instance I get an exception:

Exception while invoking method 'ufsComplete' Error: FileNotFound: file 6pTDgQjm4WKh8hH5q was not found
  at Object.Future.wait (/app/bundle/programs/server/node_modules/fibers/future.js:449:15)
  at [object Object].Meteor.methods.ufsComplete (/app/bundle/programs/server/packages/jalik_ufs.js:818:24)
  at [object Object].methodsMap.(anonymous function) (/app/bundle/programs/server/packages/rocketchat_lib.js:1071:26)
  at [object Object].methodMap.(anonymous function) (packages/rocketchat_monitoring.js:2731:30)
  at maybeAuditArgumentChecks (/app/bundle/programs/server/packages/ddp-server.js:1823:12)
  at /app/bundle/programs/server/packages/ddp-server.js:905:20
  at [object Object]._.extend.withValue (packages/meteor.js:1122:17)
  at /app/bundle/programs/server/packages/ddp-server.js:904:41
  at [object Object]._.extend.withValue (packages/meteor.js:1122:17)
  at /app/bundle/programs/server/packages/ddp-server.js:903:46
  at [object Object]._.extend.protocol_handlers.method (/app/bundle/programs/server/packages/ddp-server.js:876:21)
  at /app/bundle/programs/server/packages/ddp-server.js:755:85

    - - - - -
  at /app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/gridfs-stream/download.js:281:17
  at handleCallback (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/utils.js:95:56)
  at /app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/collection.js:1406:5
  at handleCallback (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/utils.js:95:56)
  at /app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/cursor.js:674:5
  at handleCallback (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/cursor.js:171:5)
  at setCursorNotified (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/cursor.js:515:3)
  at /app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/cursor.js:590:16
  at queryCallback (/app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/cursor.js:232:18)
  at /app/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/connection/pool.js:455:18
  at nextTickCallbackWith0Args (node.js:489:9)
  at process._tickDomainCallback (node.js:459:13)

During the upload I monitored the file system, simply with ls every 500 milliseconds. Output:

-rw-r--r-- 1 root root 51308 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 124629 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 266359 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 407910 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 545056 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 613113 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 747998 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 895372 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 1036059 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 1105907 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 1247092 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 1391004 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 1534097 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 1607418 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 1748305 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 1891814 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 2026575 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 2106042 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 2245145 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 2389057 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 2509705 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 2563873 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 2701280 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 2762720 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 2907089 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 3050353 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 3194222 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 3319905 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 3405707 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 3551702 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 3689777 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 3753275 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 3889502 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 4038142 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 4183394 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 4325941 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 4401776 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 4549150 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 4692772 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 4830345 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 4900562 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 5042068 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 5176454 May 20 15:17 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 5321270 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 5398878 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 5441550 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 5552534 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 5689809 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 5758216 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 5897195 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 6041449 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 6182265 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 6239468 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 6386432 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 6531550 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 6602141 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 6716822 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 6862593 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 6940201 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 7058984 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 7203854 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 7348627 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 7421545 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 7559819 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 7702912 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 7842611 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 7916751 May 20 15:18 6pTDgQjm4WKh8hH5q
-rw-r--r-- 1 root root 8059517 May 20 15:18 6pTDgQjm4WKh8hH5q

So the final file size I see is a little smaller than the 8094732 bytes I uploaded, but I think the full file is uploaded and than things go sideways. So the cluster file system definitly works, but the write to GridFS seems to fail, because the file is gone, or at a different location. I really think it's accessed at the wrong location and the file get's deleted during exception handling.

I thought about stickiness, but I personally think, stickiness runes the concept of load balancing.

oliver-bowman commented 7 years ago

@freach Yeah I got around this issue with using s3 however it another issue of tar.gz files failing to upload.

geekgonecrazy commented 7 years ago

After about 75% upload I restart 2 of the 3 instances, upload continues reaches 100% and returns an "Internal server error" in the web client.

Wait.. to reproduce this you have to restart 2 of the 3 instances? I'm not sure how this is testing the particular issue. This seems like this would actually introduce new issues caused specifically related to interrupting the upload.

The upload is definitely not designed in anyway to bee interrupted. If it is, interrupted the upload would definitely have to be restarted

I thought about stickiness, but I personally think, stickiness runes the concept of load balancing.

Agreed. It's not a solution I would recommend unless desperate for a quick solution

freach commented 7 years ago

@geekgonecrazy This way I can reliably reproduce the issue almost all of the time. It's optional. I get the exact same results by just uploading a couple of files and waiting until the issue appears without restarting anything. Sometimes on the first upload, sometimes I need to wait until the 4th or 5th. In the end it yields the exact same exception.

geekgonecrazy commented 7 years ago

Created upstream issue: https://github.com/jalik/jalik-ufs/issues/124

geekgonecrazy commented 7 years ago

@rocketchat/core it appears maybe we just need to add some notes to documentation to discourage if using multi-instance and multi-nodes https://github.com/jalik/jalik-ufs/issues/124#issuecomment-303530177

Or tell them they will need to use nfs or some shared volume for the upload if using Gridfs/filesystem?

ghost commented 7 years ago

I removed the GlusterFS setup and sized down to only one instance, to avoid the upload issue for now, but I'm still getting the exact same issue occasionally with only one instance. If I access the instance directly, without going through any loadbalancer/http frontend like NGINX or haproxy I am not able to reproduce the issue.

Edit: Now I even got the exception directly accessing the instance port. So even single instance, no loadbalancer will work around this issue.

freach commented 7 years ago

I debugged the issue on my side. The exception was not caused by UFS like stated before, it's caused by the Mongo client library. The client is unable to read the stored file on write finish in GridFS. I'm using readPreference=secondary with the Mongo client for performance reasons, but due to the fact that Mongo is eventually consistent the write sometimes isn't propagated to all slaves on read. So the read fails and throws an exception FileNotFound. Using the Mongo client with readPreference=pimary (default) the issue didn't occur anymore.

geekgonecrazy commented 7 years ago

@freach interesting... This must explain the issue with it happening on a single node. But as to the multi-instance the issue initially supported is for sure an issue with ufs, as I've been able to reproduce in several setups, and according to the upstream ticket... The only solution from them is to use something like NFS/samba for the /tmp/ufs.

I think at this point my recommendation would be to use block storage with multiple instances. Especially since things like Minio are available as open source and dead simple to setup

freach commented 7 years ago

@geekgonecrazy yes you are right. Though it's not an issue with UFS as UFS is working as designed, but it's not really feasible in a multi instance / multi host scenario without shared file system resources.

I'll move to S3 FileUpload as the limitations of GridFS and UFS do interfere with my availability concept.

frdmn commented 7 years ago

As @geekgonecrazy suggested, a simple workaround for this is to mount a shared /tmp/ across all of your instances/nodes, so it reflects the same content within all nodes.

If you use Docker (and docker-compose), a simple volume mapping like this can do the trick:

    [...]
    volumes:
     - /tmp:/tmp

_{^{(Extract from my productional docker-compose.yml here)}}

Melanpan commented 7 years ago

I've got multiple instances running and have shared the /tmp/ across them using GlusterFS, but another problem I seem to run into is that I have to tell Nginx that /ufs/ will only be send to one instance or images will end up being corrupted.

geekgonecrazy commented 7 years ago

@AlexanderLJ that's also an interesting route.

@RocketChat/core anyone have ideas on how we can handle it given the history here?

It seems to me the answer here is going to be to build up a doc with these limitations with suggestion to maybe check out minio for easy block storage. Then maybe display it in file uploads if they are using more then one instance of Rocket.Chat?

freach commented 7 years ago

@AlexanderLJ A few learnings from my side:

If you use NGINX as load balancer to distribute load across multiple instances and hosts use "ip_hash" as load balancing algorithm. This will guarantee requests to reach the same instance while still distributing different IPs across your workers
If you've a Mongo cluster, read from the primary node only
A /tmp/ufs GlusterFS/Ceph helps, but is not necessarily needed if you use "ip_hash" or sticky sessions on your load balancer
If you use multiple instances on only one host, you can mount the same /tmp/ufs for all containers obviously
A real alternative to GridFS (imho the better way) is to use S3 or S3 compatible object storage like Minio or LeoFS

JSzaszvari commented 7 years ago

You can mount the same data on all containers, just use filesystem instead of ufs and on the host system the target volume is a NFS share.

geekgonecrazy commented 7 years ago

Yeah I think in this case a replicated mount is just a tiny bit too slow. NFS / Samba or any other network mount that could be mounted on each host or to each container would do the trick.

If you have your own isolated network NFS is dead simple to setup.

I tried NFS in a setup locally and it worked great. Given time I'm going to try glusterfs. There might be a way to configure it that would work. But if just waiting on replication... Even the slightest delay would result in the upload failing

I'll get a document started and link it here. Any other discoveries are welcome. 👍

JSzaszvari commented 7 years ago

@geekgonecrazy happy to provide input if needed. We run 8 instances across 4 servers (Rancher docker setup, 4 nodes with each running 2 instances / containers) each with mounted data containers for the storage via NFS. Never had a single issue.

geekgonecrazy commented 7 years ago

With the new avatar storage system that will be landing in 0.57.0 This now will need to be addressed for every multi-instance setup.

admin-crowd-iugo commented 7 years ago

same issue here - 7 instances per server with over 100gb of files.

geekgonecrazy commented 7 years ago

Alright guys... This issue should now be resolved with 0.57.0

If you use sticky sessions it will simply upload to the same instance you are connected to and everything is fine.

If not using sticky and upload happens to hit a different instance then what you are on. We now proxy the request over.

Thanks everyone for helping contribute information to this thread. Very helpful

joshuaspence commented 6 years ago

I'm still seeing this issue on 0.59.3. It was originally occuring when we were using AmazonS3 for file storage, but I can reproduce it using GridFS as well. Is this expected to work without stickiness and without mounting /tmp/ufs as a shared NFS mount?

Should I open a new issue or can we reopen this one?

sampaiodiego commented 6 years ago

hi @joshuaspence yes it is expected to be working.. please feel free to open an issue providing as many details as you can. thx

joshuaspence commented 6 years ago

Sure, I have opened #9174.

raymondfeliz commented 6 years ago

the issue for me at least was 2 things.

One was the GridFS being turned on. The 2nd one is the default docker-compose file doesnt have a field for INSTANCE_IP, so the multiple instances are not aware of each other and end up showing as undefined:3000.

Just adding the below line for each instance in the environment: section

  - INSTANCE_IP=<YOUR IP>

Hopefully this helps some people with their upload issue.

geekgonecrazy commented 6 years ago

Yes multi instances INSTANCE_IP is very important for a lot of things 🙂

RocketChat / Rocket.Chat

Multi-Instance file upload broken when using ufs(gridfs/filesystem) #6926