Closed salamachinas closed 4 years ago
@salamachinas Could be related to this #7704 , could you double check your file upload settings to see if everything is in order?
All settings seems correct, getting same error with GridFS and FileSystem stogare.
I'm having what seems to be this same issue as well.
Version of Rocket.Chat Server: 0.58.2 Deployment Method: docker Number of Running Instances: 6 DB Replicaset Oplog: Yes Node Version: v4.8.4 Storage Type: GridFS
When uploading, the percentage usually goes above 0% (sometimes 50+%) but then never goes any further. I am seeing the same exception as the initial post. Anything more that we can do to test a fix for this issue?
Any word on this? This is kinda a big deal. :)
I had the same issue, it looks like that with some of the new updates the request session may have problems with the load balancer. In my case I'm using NGINX 1.12 and the upstream was a round robin (default routing method). When I left just one server, the upload worked (test that case for instance), but at the moment that I added more servers everything failed (I have 3 backend servers behind the LB). My solution, in the case of NGINX, was to use sticky cookies, therefore it will assign a specific user session connection to a specific backend server. The open source NGINX doesn't have the sticky module (it's just part of the enterprise), so I used a really good open source sticky module and compile NGINX again. Here I'm sending some useful links for that:
Download your NGINX version http://nginx.org/download
How to compile again (I'm using Amazon linux but basically it's the same stuff for any linux based OS) http://www.augustkleimo.com/build-and-install-nginx-from-source-on-amazon-ec2-linux/
Sticky module https://bitbucket.org/nginx-goodies/nginx-sticky-module-ng/src
I changed File Upload to a specific directory and after a while it attempted to upload files to /tmp/ufs. I had to set it back to GridFS and then re-set it again to FileSystem, deleted the directory entry, saved changes, and added it saved the changes again. Then it worked.
# pm2 logs main
[TAILING] Tailing last 10 lines for [main] process (change the value with --lines option)
/root/.pm2/logs/main-out-0.log last 10 lines:
0|main | ➔ | Platform: linux |
0|main | ➔ | Process Port: 3000 |
0|main | ➔ | Site URL: https://chat.fgxint.net |
0|main | ➔ | ReplicaSet OpLog: Disabled |
0|main | ➔ | Commit Hash: 24e2d2c805 |
0|main | ➔ | Commit Branch: heads/0.57.3 |
0|main | ➔ | |
0|main | ➔ +------------------------------------------------+
0|main | Setting default file store to GridFS
0|main | Setting default file store to FileSystem
/root/.pm2/logs/main-error-0.log last 10 lines:
0|main | ufs: cannot delete temp file "/tmp/ufs/cwHMokRRBbFjKvJRD" (ENOENT: no such file or directory, unlink '/tmp/ufs/cwHMokRRBbFjKvJRD')
0|main | { [MissingRequiredParameter: Missing required key 'Key' in params]
0|main | message: 'Missing required key \'Key\' in params',
0|main | code: 'MissingRequiredParameter',
0|main | time: Mon Oct 30 2017 18:53:47 GMT+0000 (UTC) }
0|main | [Error: FileNotFound: no file with id cwHMokRRBbFjKvJRD found]
0|main | { [MissingRequiredParameter: Missing required key 'Key' in params]
0|main | message: 'Missing required key \'Key\' in params',
0|main | code: 'MissingRequiredParameter',
0|main | time: Mon Oct 30 2017 18:53:47 GMT+0000 (UTC) }
[STREAMING] Now streaming realtime logs for [main] process
Unfortunately, I have not solved, but I have got some data.
Summary: AWS Beanstalk, using latest docker image, testing without a nginx proxy, one instance works fine. Two instances result in the following stack dump. The user just sees a stuck upload.
Attempted to switch storage drive to s3, exact same behavior: works with one instance, fails with two. The only difference is that the debug error lists the exepcted url /ufs/AmazonS3:Uploads...
:
UploadProxy ➔ debug Upload URL: /ufs/GridFS:Uploads/Hs3ASzs4QPciWnek6?token=xxx&progress=0.275176782303991
UploadProxy ➔ debug Wrong instance, proxing to: undefined:3000
Exception in callback of async function: Error: getaddrinfo ENOTFOUND undefined undefined:3000
at errnoException (dns.js:50:10)
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:92:26)
Exception in callback of async function: Error: socket hang up
at createHangUpError (_http_client.js:331:15)
at Socket.socketCloseListener (_http_client.js:363:23)
at emitOne (events.js:121:20)
at Socket.emit (events.js:211:7)
at TCP._handle.close [as _onclose] (net.js:554:12)
@MartinSchoeler I checked #7704. It doesn't appear to be relevant to the
debug Wrong instance, proxing to: undefined:3000
Please let me know if I can supply any additional info
in case you're following along at home, you can see that the error comes from: https://github.com/RocketChat/Rocket.Chat/blob/73d4b09ec4546b2aeb08ed0fb17c1b46b081bf54/packages/rocketchat-file-upload/server/lib/proxy.js#L72
Pretty sure I know what the problem is now:
instance.extraInformation.host
is undefined
. In order for the proxy call to work, that value needs to be the hostname or ip of the host that has the 'in-process' upload. Which is set here:
From reading the code, I would expect the hostname to be (the still incorrect) localhost
instead of undefined
. Encouraging that the port is correct.
found this issue which is related: https://github.com/RocketChat/docs/issues/316
Hosts need to be routable. I need to look at our vpc network config. Hosts may be in any of several (4?) availability zones and I'll need to see if they can route between em. I don't think they can now.
If they could route (solveable), I need to be able to set INSTANCE_IP
env var in the docker container to the public domain name for the host, which I can get with: curl -s http://169.254.169.254/latest/meta-data/public-hostname
and of course could get through the node aws sdk too. No idea how to do this...yet.
this is really pain in the ass... tryed to build a HA-scenario via docker and finally getting to the point that this is not possible atm is frustrating. here's my forum-topic in which i've described my process until this issue is in my way.
so: i'm pushing this one and hoping there's someone out there... ;)
@dmeier86 I solved this for our deployment on Amazon Web Services Elastic Beanstalk with a truly nasty hack.
Here's how I did it.
create your own docker file and copy in the existing rocketchat docker file.
edit the file so that the end of the file looks like this
....
# These two lines are all that are different from the
# rocket.chat:latest. Obvs this is total crap.
ADD run.js .
RUN sed -i.old "1s;^;require\('./run.js'\);" main.js
EXPOSE 3000
CMD ["node", "main.js"]
./run.js
const request = require('/app/bundle/programs/server/npm/node_modules/request');
request('http://169.254.169.254/latest/meta-data/local-ipv4', (error, response, body) => {
console.log('SETTING INSTANCE_IP:', error, body);
process.env.INSTANCE_IP = body;
});
So what does this nasty, nasty hack do? When the container is built, the docker file will use sed to insert a require at the top of the default main.js
to include the file ./run.js
. So when rockert chat starts by running main.js
it'll first run run.js
which looks up the ip address of the host and sets the required environment var (INSTANCE_IP
) for rocketchat.
If you are not using AWS you'll need to look at my run.js code and adjust it for your situation.
Needless to say, this is not a good solution, but it works.
Good Luck!
@acinader thanks for figuring this out.
@RocketChat/core is this a workaround which could - in a less hacky way - be integrated to some startup
-code?
Personally anytime I have this I add something like: INSTANCE_IP=$(hostname -I)
before the start up.
Tricks in platforms to help grab the IP and set as the environment variable
Hi, I had a file upload error when INSTANCE_IP
was not configured, I used docker-compose up -d --scale rocketchat=2
.
I configured static addresses for containers and added INSTANCE_IP
to environment variables. The problem is gone.
When using multiple instances, the information page should have the item "Broadcast Connected Instances".
Good Luck!
Hi, I had a file upload error when
INSTANCE_IP
was not configured, I useddocker-compose up -d --scale rocketchat=2
.I configured static addresses for containers and added
INSTANCE_IP
to environment variables. The problem is gone.When using multiple instances, the information page should have the item "Broadcast Connected Instances".
Good Luck!
Thank you for the solution.
Here is the my yaml in kubernete
spec: containers:
we changed how the upload works so this issue should not happen again
Upload don't working after migration to 0.57.3 with more than one instance.
Error from logs