RocketChat / Rocket.Chat

The communications platform that puts data protection first.
https://rocket.chat/
Other
40.4k stars 10.51k forks source link

Upload don't working #7706

Closed salamachinas closed 4 years ago

salamachinas commented 7 years ago

Upload don't working after migration to 0.57.3 with more than one instance.

Error from logs

rocketchat_3   | Exception in callback of async function: Error: getaddrinfo ENOTFOUND undefined undefined:3000
rocketchat_3   |   at errnoException (dns.js:27:10)
rocketchat_3   |   at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:78:26)
rocketchat_3   | 
MartinSchoeler commented 7 years ago

@salamachinas Could be related to this #7704 , could you double check your file upload settings to see if everything is in order?

salamachinas commented 7 years ago

All settings seems correct, getting same error with GridFS and FileSystem stogare.

a b

phutchins commented 7 years ago

I'm having what seems to be this same issue as well.

Version of Rocket.Chat Server: 0.58.2 Deployment Method: docker Number of Running Instances: 6 DB Replicaset Oplog: Yes Node Version: v4.8.4 Storage Type: GridFS

When uploading, the percentage usually goes above 0% (sometimes 50+%) but then never goes any further. I am seeing the same exception as the initial post. Anything more that we can do to test a fix for this issue?

phutchins commented 7 years ago

Any word on this? This is kinda a big deal. :)

juancho088 commented 7 years ago

I had the same issue, it looks like that with some of the new updates the request session may have problems with the load balancer. In my case I'm using NGINX 1.12 and the upstream was a round robin (default routing method). When I left just one server, the upload worked (test that case for instance), but at the moment that I added more servers everything failed (I have 3 backend servers behind the LB). My solution, in the case of NGINX, was to use sticky cookies, therefore it will assign a specific user session connection to a specific backend server. The open source NGINX doesn't have the sticky module (it's just part of the enterprise), so I used a really good open source sticky module and compile NGINX again. Here I'm sending some useful links for that:

Download your NGINX version http://nginx.org/download

How to compile again (I'm using Amazon linux but basically it's the same stuff for any linux based OS) http://www.augustkleimo.com/build-and-install-nginx-from-source-on-amazon-ec2-linux/

Sticky module https://bitbucket.org/nginx-goodies/nginx-sticky-module-ng/src

dariosusman commented 6 years ago

I changed File Upload to a specific directory and after a while it attempted to upload files to /tmp/ufs. I had to set it back to GridFS and then re-set it again to FileSystem, deleted the directory entry, saved changes, and added it saved the changes again. Then it worked.

# pm2 logs main 
[TAILING] Tailing last 10 lines for [main] process (change the value with --lines option)
/root/.pm2/logs/main-out-0.log last 10 lines:
0|main     | ➔ |             Platform: linux                    |
0|main     | ➔ |         Process Port: 3000                     |
0|main     | ➔ |             Site URL: https://chat.fgxint.net  |
0|main     | ➔ |     ReplicaSet OpLog: Disabled                 |
0|main     | ➔ |          Commit Hash: 24e2d2c805               |
0|main     | ➔ |        Commit Branch: heads/0.57.3             |
0|main     | ➔ |                                                |
0|main     | ➔ +------------------------------------------------+
0|main     | Setting default file store to GridFS
0|main     | Setting default file store to FileSystem

/root/.pm2/logs/main-error-0.log last 10 lines:
0|main     | ufs: cannot delete temp file "/tmp/ufs/cwHMokRRBbFjKvJRD" (ENOENT: no such file or directory, unlink '/tmp/ufs/cwHMokRRBbFjKvJRD')
0|main     | { [MissingRequiredParameter: Missing required key 'Key' in params]
0|main     |   message: 'Missing required key \'Key\' in params',
0|main     |   code: 'MissingRequiredParameter',
0|main     |   time: Mon Oct 30 2017 18:53:47 GMT+0000 (UTC) }
0|main     | [Error: FileNotFound: no file with id cwHMokRRBbFjKvJRD found]
0|main     | { [MissingRequiredParameter: Missing required key 'Key' in params]
0|main     |   message: 'Missing required key \'Key\' in params',
0|main     |   code: 'MissingRequiredParameter',
0|main     |   time: Mon Oct 30 2017 18:53:47 GMT+0000 (UTC) }

[STREAMING] Now streaming realtime logs for [main] process
acinader commented 6 years ago

Unfortunately, I have not solved, but I have got some data.

Summary: AWS Beanstalk, using latest docker image, testing without a nginx proxy, one instance works fine. Two instances result in the following stack dump. The user just sees a stuck upload.

Attempted to switch storage drive to s3, exact same behavior: works with one instance, fails with two. The only difference is that the debug error lists the exepcted url /ufs/AmazonS3:Uploads...:

UploadProxy ➔ debug Upload URL: /ufs/GridFS:Uploads/Hs3ASzs4QPciWnek6?token=xxx&progress=0.275176782303991
UploadProxy ➔ debug Wrong instance, proxing to: undefined:3000
Exception in callback of async function: Error: getaddrinfo ENOTFOUND undefined undefined:3000
  at errnoException (dns.js:50:10)
  at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:92:26)

Exception in callback of async function: Error: socket hang up
  at createHangUpError (_http_client.js:331:15)
  at Socket.socketCloseListener (_http_client.js:363:23)
  at emitOne (events.js:121:20)
  at Socket.emit (events.js:211:7)
  at TCP._handle.close [as _onclose] (net.js:554:12)

image

image

acinader commented 6 years ago

@MartinSchoeler I checked #7704. It doesn't appear to be relevant to the

debug Wrong instance, proxing to: undefined:3000

Please let me know if I can supply any additional info

acinader commented 6 years ago

in case you're following along at home, you can see that the error comes from: https://github.com/RocketChat/Rocket.Chat/blob/73d4b09ec4546b2aeb08ed0fb17c1b46b081bf54/packages/rocketchat-file-upload/server/lib/proxy.js#L72

acinader commented 6 years ago

Pretty sure I know what the problem is now:

at: https://github.com/RocketChat/Rocket.Chat/blob/73d4b09ec4546b2aeb08ed0fb17c1b46b081bf54/packages/rocketchat-file-upload/server/lib/proxy.js#L72

instance.extraInformation.host is undefined. In order for the proxy call to work, that value needs to be the hostname or ip of the host that has the 'in-process' upload. Which is set here:

https://github.com/RocketChat/Rocket.Chat/blob/1a589686b38798cfde18e615018baa71ed8062ce/server/startup/presence.js#L4

From reading the code, I would expect the hostname to be (the still incorrect) localhost instead of undefined. Encouraging that the port is correct.

found this issue which is related: https://github.com/RocketChat/docs/issues/316

  1. Hosts need to be routable. I need to look at our vpc network config. Hosts may be in any of several (4?) availability zones and I'll need to see if they can route between em. I don't think they can now.

  2. If they could route (solveable), I need to be able to set INSTANCE_IP env var in the docker container to the public domain name for the host, which I can get with: curl -s http://169.254.169.254/latest/meta-data/public-hostname and of course could get through the node aws sdk too. No idea how to do this...yet.

dmeier86 commented 6 years ago

this is really pain in the ass... tryed to build a HA-scenario via docker and finally getting to the point that this is not possible atm is frustrating. here's my forum-topic in which i've described my process until this issue is in my way.

so: i'm pushing this one and hoping there's someone out there... ;)

acinader commented 6 years ago

@dmeier86 I solved this for our deployment on Amazon Web Services Elastic Beanstalk with a truly nasty hack.

Here's how I did it.

  1. create your own docker file and copy in the existing rocketchat docker file.

  2. edit the file so that the end of the file looks like this

....
# These two lines are all that are different from the
# rocket.chat:latest.  Obvs this is total crap.
ADD run.js .
RUN sed -i.old "1s;^;require\('./run.js'\);" main.js

EXPOSE 3000
CMD ["node", "main.js"]
  1. Create a file ./run.js
const request = require('/app/bundle/programs/server/npm/node_modules/request');
request('http://169.254.169.254/latest/meta-data/local-ipv4', (error, response, body) => {
  console.log('SETTING INSTANCE_IP:', error, body);
  process.env.INSTANCE_IP = body;
});

So what does this nasty, nasty hack do? When the container is built, the docker file will use sed to insert a require at the top of the default main.js to include the file ./run.js. So when rockert chat starts by running main.js it'll first run run.js which looks up the ip address of the host and sets the required environment var (INSTANCE_IP) for rocketchat.

If you are not using AWS you'll need to look at my run.js code and adjust it for your situation.

Needless to say, this is not a good solution, but it works.

Good Luck!

mrsimpson commented 6 years ago

@acinader thanks for figuring this out. @RocketChat/core is this a workaround which could - in a less hacky way - be integrated to some startup-code?

geekgonecrazy commented 6 years ago

Personally anytime I have this I add something like: INSTANCE_IP=$(hostname -I) before the start up.

Tricks in platforms to help grab the IP and set as the environment variable

vinogradovia commented 5 years ago

Hi, I had a file upload error when INSTANCE_IP was not configured, I used docker-compose up -d --scale rocketchat=2.

I configured static addresses for containers and added INSTANCE_IP to environment variables. The problem is gone.

When using multiple instances, the information page should have the item "Broadcast Connected Instances".

image

Good Luck!

thomas81528262 commented 4 years ago

Hi, I had a file upload error when INSTANCE_IP was not configured, I used docker-compose up -d --scale rocketchat=2.

I configured static addresses for containers and added INSTANCE_IP to environment variables. The problem is gone.

When using multiple instances, the information page should have the item "Broadcast Connected Instances".

image

Good Luck!

Thank you for the solution.

Here is the my yaml in kubernete

spec: containers:

ggazzo commented 4 years ago

we changed how the upload works so this issue should not happen again