RocketChat / Rocket.Chat

The communications platform that puts data protection first.
https://rocket.chat/
Other
39.98k stars 10.31k forks source link

File uploads from Android RocketChat+ cause "unix error: Too many open files" on server #10263

Closed joshuaboniface closed 6 years ago

joshuaboniface commented 6 years ago

Description:

When uploading a mid-sized (~2-3MB) image file from RocketChat+ on Android, the server throws an error. The upload fails with some sort of "timeout"/"Your connection seems off..."/"Authenticating" loop on the client, and the server goes into a socket error loop.

Server Setup Information:

Steps to Reproduce:

  1. Upload image file from RocketChat+ mobile to channel on two-node cluster with separate MongoDB.
  2. Observe RocketChat log on server

Expected behavior:

The upload works.

Actual behavior:

Server throws errors; instance dies for some time (<60s usually) before returning to service, upload invariably fails.

Relevant logs:

Mar 29 01:37:02 rc1 RocketChat[30413]: Error: vips__file_open_read: unable to open file "/tmp/ufs/3HBbw2zmLASY5ZSHu" for reading                                                                                                               
Mar 29 01:37:02 rc1 RocketChat[30413]: unix error: Too many open files                                                           
Mar 29 01:37:02 rc1 RocketChat[30413]: Error: FileNotFound: no file with id 3HBbw2zmLASY5ZSHu found          
Mar 29 01:37:02 rc1 RocketChat[30413]:     at /var/lib/rocket.chat/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/gridfs-stream/index.js:190:25
Mar 29 01:37:02 rc1 RocketChat[30413]:     at /var/lib/rocket.chat/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/collection.js:1215:18
Mar 29 01:37:02 rc1 RocketChat[30413]:     at handleCallback (/var/lib/rocket.chat/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/utils.js:120:56)
Mar 29 01:37:02 rc1 RocketChat[30413]:     at /var/lib/rocket.chat/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb/lib/collection.js:1252:5
Mar 29 01:37:02 rc1 RocketChat[30413]:     at /var/lib/rocket.chat/bundle/programs/server/npm/node_modules/meteor/npm-mongo/node_modules/mongodb-core/lib/connection/pool.js:469:18
Mar 29 01:37:02 rc1 RocketChat[30413]:     at _combinedTickCallback (internal/process/next_tick.js:131:7)
Mar 29 01:37:02 rc1 RocketChat[30413]:     at process._tickDomainCallback (internal/process/next_tick.js:218:9)
Mar 29 01:37:02 rc1 RocketChat[30413]: ufs: cannot write file "3HBbw2zmLASY5ZSHu" (EMFILE: too many openfiles, open '/var/lib/rocket.chat/uploads/3HBbw2zmLASY5ZSHu.jpg') { Error: EMFILE: too many open files, open '/var/lib/ro
cket.chat/uploads/3HBbw2zmLASY5ZSHu.jpg'                                                                                                                 
Mar 29 01:37:02 rc1 RocketChat[30413]:   errno: -24,                                                                                              
Mar 29 01:37:02 rc1 RocketChat[30413]:   code: 'EMFILE',                                                                                     
Mar 29 01:37:02 rc1 RocketChat[30413]:   syscall: 'open',                                                                                                            
Mar 29 01:37:02 rc1 RocketChat[30413]:   path: '/var/lib/rocket.chat/uploads/3HBbw2zmLASY5ZSHu.jpg' }                                                            
Mar 29 01:37:02 rc1 RocketChat[30413]: Exception while invoking method 'ufsComplete' Error: EMFILE: too many open files, open '/var/lib/rocket.chat/uploads/3HBbw2zmLASY5ZSHu.jpg'
Mar 29 01:37:02 rc1 RocketChat[30413]: ufs: cannot delete temp file "/tmp/ufs/3HBbw2zmLASY5ZSHu" (ENOENT: no such file or directory, unlink '/tmp/ufs/3HBbw2zmLASY5ZSHu')
Mar 29 01:37:02 rc1 RocketChat[30413]: ufs: cannot write chunk of file "HM58eW6efrWq53sF4" (EMFILE: too many open files, open '/tmp/ufs/HM58eW6efrWq53sF4')
Mar 29 01:37:02 rc1 RocketChat[30413]: ufs: cannot delete temp file "/tmp/ufs/HM58eW6efrWq53sF4" (ENOENT: no such file or directory, unlink '/tmp/ufs/HM58eW6efrWq53sF4')
Mar 29 01:38:46 rc1 RocketChat[30413]: Exception in callback of async function: Error: socket hang up
Mar 29 01:38:46 rc1 RocketChat[30413]:     at createHangUpError (_http_client.js:331:15)
Mar 29 01:38:46 rc1 RocketChat[30413]:     at Socket.socketOnEnd (_http_client.js:423:23)
Mar 29 01:38:46 rc1 RocketChat[30413]:     at emitNone (events.js:111:20)
Mar 29 01:38:46 rc1 RocketChat[30413]:     at Socket.emit (events.js:208:7)
Mar 29 01:38:46 rc1 RocketChat[30413]:     at endReadableNT (_stream_readable.js:1055:12)
Mar 29 01:38:46 rc1 RocketChat[30413]:     at _combinedTickCallback (internal/process/next_tick.js:138:11)
Mar 29 01:38:46 rc1 RocketChat[30413]:     at process._tickDomainCallback (internal/process/next_tick.js:218:9)
Mar 29 01:38:46 rc1 RocketChat[30413]: [Thu Mar 29 2018 01:38:46 GMT-0400 (EDT)] ERROR Error: socket hang up
Mar 29 01:38:46 rc1 RocketChat[30413]:     at createHangUpError (_http_client.js:331:15)
Mar 29 01:38:46 rc1 RocketChat[30413]:     at Socket.socketOnEnd (_http_client.js:423:23)
Mar 29 01:38:46 rc1 RocketChat[30413]:     at emitNone (events.js:111:20)
Mar 29 01:38:46 rc1 RocketChat[30413]:     at Socket.emit (events.js:208:7)
Mar 29 01:38:46 rc1 RocketChat[30413]:     at endReadableNT (_stream_readable.js:1055:12)
Mar 29 01:38:46 rc1 RocketChat[30413]:     at _combinedTickCallback (internal/process/next_tick.js:138:11)
Mar 29 01:38:46 rc1 RocketChat[30413]:     at process._tickDomainCallback (internal/process/next_tick.js:218:9)

The last ~16 lines repeat 12+ times in a couple seconds, then the instance eventually stabilizes with no further errors in the logs.

The "Too many open files" prompted me to try lsof, but I'm unable to see RocketChat using any excessive number of files. /tmp is a 1GB ramdisk. /var/lib/rocket.chat/uploads is a symbolic link to /srv/rocketchat/uploads, which is a CephFS filesystem (shared storage).

geekgonecrazy commented 6 years ago

Number of Running Instances: 2 (file uploads to path /ufs/FileSystem:Uploads directed to first node only)

This could be problematic.

Let's say Instance A is the one that gets this, and Instance B is the one you are connected to via a websocket. If it were to post a file... If that path goes to Instance A it will be proxied server side over to Instance B to handle since that's where you are connected.

On the new beta version of the Rocket.Chat+ app this is no longer the case and it'll upload all files via the rest api endpoint.

geekgonecrazy commented 6 years ago

Might be worth giving the beta a try, though you will loose your connection info when you upgrade to the beta version. You'll have to re-input it.

But at any rate this is an issue that is/will be fixed on the Android app side. Known issue to have it hang like this on the mobile app. As you can see above... The complexity of how it has to upload makes perfect sense why 😁

Going to go ahead and close this

joshuaboniface commented 6 years ago

Thanks @geekgonecrazy that does make sense. Just to clarify though, the problem seems to occur behind load balancing regardless of the force-to-A rule (I had it off for the last few weeks, and added it because of this issue - it works at least 25% of the time with the rule, and 0% of the time without), but I added it back as a test. I'll give the beta app a try though!

geekgonecrazy commented 6 years ago

I've had it happen with a single instance also. Uploading via the real time api is just painful. :). Please make sure to let the mobile team know what you think. They have been working hard on it 😁