RocketChat / feature-requests

This repository is used to track Rocket.Chat feature requests and discussions. Click here to open a new feature request.
21 stars 9 forks source link

Allow direct upload to S3 #653

Open jhmk opened 5 years ago

jhmk commented 5 years ago

Is your feature request related to a problem? Please describe. When using Multi-Zone deployment of Rocket.Chat with one S3 Bucket then uploading a file can be very slow because it is routed through the primary MongoDB and not the nearest RC instance.

Example: User in UK, Primary DB in Sydney, S3 in EU, RC Instances in UK/EU/Sydney. User is connected to the RC Instance in UK, because it is the nearest one, but file uploading is routed through Sydney which makes it very slow. User UK -> RC Instance in UK -> DB in Sydney -> S3 Bucket in EU

From my Servers & Docker Containers I have an Upload to S3 from about 15-60 MB/s. Between my Server I can transmit files with 40-600 Mbits/sec. For my test I used the same library as RC is using (aws-sdk/clients/s3). When I uploading a file on Rocket.Chat the upload speed is about 100-300 KB/s. This is 0.5% of the normal speed. Even when I remove the instance in AU it wont get faster.

Describe the solution you'd like Direct uploading the File through the RC Instance the User is connected to and not the one with the Primary DB or make S3 uploading faster.

rc-uploading

Other Tickets about this Issue: https://github.com/RocketChat/Rocket.Chat/issues/7857

https://github.com/RocketChat/Rocket.Chat/issues/7524

https://github.com/RocketChat/Rocket.Chat/issues/1519

jhmk commented 5 years ago

For the ones who are interesting here some more Stats: Upload to a normal S3 Bucket in Frankfurt: I also tested the speed form the Docker Container that runs Rocket.Chat but there was just 1-2% difference

AWS Server in Germany: 5.00M --.-KB/s in 0.05s 10.00M --.-KB/s in 0.1s
100.00M 59.5MB/s in 1.7s 500.00M 49.1MB/s in 10s

AWS Server in UK: 5.00M 7.15MB/s in 0.7s 10.00M 7.27MB/s in 1.4s
100.00M 8.04MB/s in 12s 500.00M 7.41MB/s in 68s

AWS Server in Australia: 5.00M 833KB/s in 12s 10.00M 1.92MB/s in 12s 100.00M 5.56MB/s in 30s 500.00M 5.54MB/s in 2m 9s

To a S3 Bucket in Frankfurt with Acceleration: AWS Server in Germany: 5.00M --.-KB/s in 0.08s 10.00M 53.7MB/s in 0.2s
100.00M 67.8MB/s in 1.5s 500.00M 53.1MB/s in 9.1s

AWS Server in UK: 5.00M --.-KB/s in 0.09s
10.00M 53.5MB/s in 0.2s 100.00M 46.9MB/s in 2.1s
500.00M 66.4MB/s in 9.3s

AWS Server in Australia: 5.00M 3.50MB/s in 1.4s 10.00M 4.42MB/s in 2.3s 100.00M 17.0MB/s in 7.6s
500.00M 14.8MB/s in 33s

iPerf 3 traffic test between t3.medium Server in AWS: I also tested the speed form the Docker Container that runs Rocket.Chat but there was just 1-5% difference

Germany <-> UK: 613 Mbits/sec Germany <-> Australia: 37.6 Mbits/sec UK <-> Australia: 40.0 Mbits/sec Germany <-> Germany: 962 Mbits/sec Hetzner Server Germany -> AWS UK = 819 Mbits/sec Hetzner Server Germany -> AWS Germany = 1.24 Gbits/sec

jhmk commented 5 years ago

It turns out that uploading is faster when you uploading multiple files at the same time

geekgonecrazy commented 5 years ago

Meant to reply to this a bit sooner.

We actually used to use a library called slingshot that would allow clients to upload directly to s3.

We changed this when we unified storage so that all clients especially our mobile apps could easily upload.

Just to correct a bit here.. file uploads are definitely not routed to primary only.

What happens is when you connect you connect via a websocket and when you goto upload a message is sent to the server over the websocket with the intent. You get an Id back and then your client posts the file via http to a url including that Id.

Now if you happen to hit a different instance then the one you hit with your websocket... that request is proxied through to that host.

Nothing proxies specifically to the primary or cares about the primary other then the write to generate the id and store meta data.

I see aws. Make sure for one websocket a are going through your load balancer.

Second many turn on sticky sessions. Just making sure you hit same instance so there is no proxying.

Along that lines... take a look at your routing. Seems a bit odd that traffic would hit the primary instance through the LB at any point if routing to nearest region

jhmk commented 5 years ago

Thanks @geekgonecrazy for reply. But how do you explain it that it is getting far more faster when I upload multiple files? And even if it is routed through all my Servers it is just using 0,01% of the possible bandwidth speed. I would also suggest to replace the uploading library because there is no active development for it anymore https://github.com/jalik/jalik-ufs

geekgonecrazy commented 5 years ago

@jhmk only thing I can think is some how you're getting lucky and they are hitting the same server your websocket is connected to.

Can peak in the browser console at the network log and look at the instance-id header to see if websocket is hitting some instance as the file uploads.

Regarding ufs thanks for pointing that out. I think what ever we do would take a bit of time to switch. Mobile has to support ufs plus what ever we switch to. Since mobile has to support multiple versions of Rocket.Chat not just the current one :)

jhmk commented 5 years ago

The IP and instance-id header is always the same.

I also notice when I uploading one single file (slow upload) the content-length is about 36553. When I'm uploading multiple files the content-length increases up to 200.000, also uploading is way more faster. Is there a way to define a higher content-length in RC? I tried it via nginx but it breaks RC.

geekgonecrazy commented 5 years ago

@rocketchat/core anyone know why multiple would upload faster than one?

This seems very counter-intuitive

jhmk commented 5 years ago

@geekgonecrazy Any news about this issue? If you need a Beta Tester just tell the the branch

geekgonecrazy commented 5 years ago

i'm afraid not.

Best bet at this point is to make sure you have sticky sessions on so all traffic hits same instance

jhmk commented 5 years ago

i'm afraid not.

Best bet at this point is to make sure you have sticky sessions on so all traffic hits same instance

The x-instance-id when I'm uploading a file is always the same. When you know another method to test it let me know

geekgonecrazy commented 5 years ago

@rodrigok any idea why the content length would get higher with more concurrent uploads? Is this something we can configure?

Seems pretty strange that an upload would go faster just by doing 2 separate uploads in parallel

ghost commented 5 years ago

Just wondering, are there possibly any updates on this? I've noticed that direct uploads to the server don't seem to go at more than 3-4Mbps due to the way it seems to break the file up into chunks and send multiple POST requests. Could it possibly have the option to send it all in a single large POST request, speeding it up for faster connections?

Direct upload to S3 is also an option :)

jhmk commented 5 years ago

Just saw that you removed the old Slingshot upload engine. Maybe this will fix this problem. I'm waiting for the final 1.0 release and then give you an update.

@ThePigsMud I also tested it on WebDav, Google Cloud etc. and had the same problem everywhere

rodrigok commented 5 years ago

We are working to replace the way we send uploads to use REST, this change may not be finished by the time of the 1.0 release, but it will be for 1.1.

Thanks