element-hq / element-web

A glossy Matrix collaboration client for the web.
https://element.io
Apache License 2.0
10.99k stars 1.95k forks source link

Multiple uploads resulting in CANCELED #25756

Open Cyberes opened 1 year ago

Cyberes commented 1 year ago

Steps to reproduce

I am attempting to upload two video files (each about 30 MB) to a room at the same time. This always fails with a generic error message popup (The file '[filename]' failed to upload.). There is no error in the console and the network of the debug window shows the fetch request as (canceled). Synapse logs don't show any errors and neither does Openresty. Uploading the files one at a time works, but uploading them together always fails.

We've investigated the Synapse side of things and everything seems to be working -> https://github.com/matrix-org/synapse/issues/15841

media_store_path:                                 "/mnt/matrix-storage/media"

media_storage_providers:
- module:                                         s3_storage_provider.S3StorageProviderBackend
  store_local:                                    True
  store_remote:                                   True
  store_synchronous:                              True
  config:
      bucket:                                     matrix-synapse
      endpoint_url:                               'https://xxx.r2.cloudflarestorage.com/matrix-synapse'
      access_key_id:                              'xxx'
      secret_access_key:                          'xxx'

Outcome

What did you expect?

Files are uploaded.

What happened instead?

Element said the upload failed and the request in the network console showed (CANCELED).

Operating system

Ubuntu

Application version

1.11.34

How did you install the app?

Repository

Homeserver

matrix.evulid.cc

Will you send logs?

Yes, but I didn't find any relevant logs during my investigation.

t3chguy commented 1 year ago

Please try again with the network tab open and inspect the underlying failure. Ctrl Shift I > Network > then try

Cyberes commented 1 year ago

I checked the debugger and saw that I am triggering the timeout at https://github.com/matrix-org/matrix-js-sdk/blob/acbcb4658a5d5903dfd557e3e115241d0a6f38bb/src/http-api/index.ts#L102-L111. The upload only takes a few seconds but the bar sits at 100% for a bit before the timeout kills it.

Element Desktop HAR. https://github.com/matrix-org/synapse/files/11996114/vector.fixed.har.txt

You can read through the linked Synapse issue to get up to speed.

t3chguy commented 1 year ago

That implies that there was no XHR progress for 30 seconds, https://github.com/matrix-org/matrix-js-sdk/blob/acbcb4658a5d5903dfd557e3e115241d0a6f38bb/src/http-api/index.ts#L102C24-L102C24 resets the cancellation timeout on any progress. I would argue that no progress on the upload in 30s then something is up with the connection.

Cyberes commented 1 year ago

I saw that the uploaded file appears in the media store which means that Synapse received the file correctly. Is there something with the communication between Synapse and Element around upload completion that would lead to this canceled issue?

t3chguy commented 1 year ago

Yes seems like either Synapse or your reverse proxy or any proxies inbetween hang for >30s at the end of the upload not sending any progress or closing the connection & sending a status code.

Cyberes commented 1 year ago

Okay, I'll do some more digging into my setup.

Cyberes commented 1 year ago

Okay, I have some new logs to support my position that this is an Element issue. Here's the final part of the log where the client cancels the upload because it hits its internal timeout:

Log ``` http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2078802 SSL_read: 7168 SSL_read: 3072 delete posted event 00005645A6C2E150 post event 00005645A6C2E150 SSL_read: avail:0 http client request body recv 10240 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 10240 file: 0, size: 0 http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2089042 SSL_read: 10240 delete posted event 00005645A6C2E150 post event 00005645A6C2E150 SSL_read: avail:0 http client request body recv 10240 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 10240 file: 0, size: 0 http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2099282 SSL_read: 3072 SSL_read: 7168 delete posted event 00005645A6C2E150 post event 00005645A6C2E150 SSL_read: avail:0 http client request body recv 10240 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 10240 file: 0, size: 0 http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2109522 SSL_read: 9216 SSL_read: 1024 delete posted event 00005645A6C2E150 post event 00005645A6C2E150 SSL_read: avail:0 http client request body recv 10240 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 10240 file: 0, size: 0 http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2119762 SSL_read: 10240 delete posted event 00005645A6C2E150 post event 00005645A6C2E150 SSL_read: avail:0 http client request body recv 10240 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 10240 file: 0, size: 0 http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2130002 SSL_read: 5120 SSL_read: 5120 delete posted event 00005645A6C2E150 post event 00005645A6C2E150 SSL_read: avail:0 http client request body recv 10240 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 10240 file: 0, size: 0 http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2140242 SSL_read: 10240 delete posted event 00005645A6C2E150 post event 00005645A6C2E150 SSL_read: avail:0 http client request body recv 10240 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 10240 file: 0, size: 0 http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2150482 SSL_read: 1024 SSL_read: 9216 delete posted event 00005645A6C2E150 post event 00005645A6C2E150 SSL_read: avail:0 http client request body recv 10240 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 10240 file: 0, size: 0 http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2160722 SSL_read: 7168 SSL_read: 3072 delete posted event 00005645A6C2E150 post event 00005645A6C2E150 SSL_read: avail:0 http client request body recv 10240 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 10240 file: 0, size: 0 http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2170962 SSL_read: 10240 delete posted event 00005645A6C2E150 post event 00005645A6C2E150 SSL_read: avail:0 http client request body recv 10240 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 10240 file: 0, size: 0 http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2181202 SSL_read: 3072 SSL_read: 7168 delete posted event 00005645A6C2E150 post event 00005645A6C2E150 SSL_read: avail:0 http client request body recv 10240 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 10240 file: 0, size: 0 http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2191442 SSL_read: 9216 SSL_read: 1024 delete posted event 00005645A6C2E150 post event 00005645A6C2E150 SSL_read: avail:0 http client request body recv 10240 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 10240 file: 0, size: 0 http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2201682 SSL_read: 10240 delete posted event 00005645A6C2E150 post event 00005645A6C2E150 SSL_read: avail:0 http client request body recv 10240 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 10240 file: 0, size: 0 http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2211922 SSL_read: 5120 SSL_read: 5120 delete posted event 00005645A6C2E150 post event 00005645A6C2E150 SSL_read: avail:0 http client request body recv 10240 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 10240 file: 0, size: 0 http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2222162 SSL_read: 10240 delete posted event 00005645A6C2E150 post event 00005645A6C2E150 SSL_read: avail:0 http client request body recv 10240 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 10240 file: 0, size: 0 http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2232402 SSL_read: 1024 SSL_read: 9216 delete posted event 00005645A6C2E150 post event 00005645A6C2E150 SSL_read: avail:0 http client request body recv 10240 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 10240 file: 0, size: 0 http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2242642 SSL_read: 7168 SSL_read: 3072 delete posted event 00005645A6C2E150 post event 00005645A6C2E150 SSL_read: avail:0 http client request body recv 10240 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 10240 file: 0, size: 0 http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2252882 SSL_read: 10240 delete posted event 00005645A6C2E150 post event 00005645A6C2E150 SSL_read: avail:0 http client request body recv 10240 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 10240 file: 0, size: 0 http write client request body, bufs 00005645A6C71848 write: 43, 00005645A6DDBD90, 10240, 2263122 SSL_read: 3072 SSL_read: 0 SSL_get_error: 6 peer shutdown SSL cleanly http client request body recv 3072 http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 3072 file: 0, size: 0 http client request body rest 30108589 http client request body recv 0 client prematurely closed connection, client: 10.0.0.9, server: , request: "POST /_matrix/media/r0/upload?filename=DCS%2520Fan%2520Boys%2520Rejoice%2520as%2520Ukraine%2520Releases%2520Fresh%2520Jet%2520Video.mp4 HTTP/1.1", host: "matrix.evulid.cc" http finalize request: 400, "/_matrix/media/r0/upload?filename=DCS%2520Fan%2520Boys%2520Rejoice%2520as%2520Ukraine%2520Releases%2520Fresh%2520Jet%2520Video.mp4" a:1, c:1 http terminate request count:1 http terminate cleanup count:1 blk:0 http posted request: "/_matrix/media/r0/upload?filename=DCS%2520Fan%2520Boys%2520Rejoice%2520as%2520Ukraine%2520Releases%2520Fresh%2520Jet%2520Video.mp4" http terminate handler count:1 http request count:1 blk:0 http close request http log handler run cleanup: 00005645A6C71858 file cleanup: fd:43 free: 00005645A6DDBD90 free: 00005645A6D706B0, unused: 1 free: 00005645A6C70870, unused: 0 free: 00005645A6E3F6F0, unused: 2808 close http connection: 42 SSL_shutdown: 1 delete posted event 00005645A6C2E150 reusable connection: 0 free: 00005645A6C702B0 free: 00005645A6DE0D90, unused: 40 ```

Here are the important bits:

The log shows that the client prematurely closed the connection:

client prematurely closed connection, client: 10.0.0.9, server: , request: "POST /_matrix/media/r0/upload?filename=DCS%2520Fan%2520Boys%2520Rejoice%2520as%2520Ukraine%2520Releases%2520Fresh%2520Jet%2520Video.mp4 HTTP/1.1", host: "matrix.evulid.cc"

We already know this is happening, so nothing new here.

But, take a look at this line right before the client prematurely closed connection:

http client request body rest 30108589
http client request body recv 0

The server is still in the process of receiving the request body when the client closes the connection. If I understand Nginx debug logging, recv 0 means the server is expecting to receive more data, but didn't.

t3chguy commented 1 year ago

@Cyberes while you have the logging enabled could you compare it to a working 1-file upload?

Cyberes commented 1 year ago

Here you go. It's the complete process, all 3000 lines. (Hopefully I removed all my personal information) debug.txt The file was a 940 kB image. The main POST starts on line 1984.

Cyberes commented 1 year ago

There is one line that shows

recv() failed (104: Connection reset by peer) while sending to client, client: 10.0.0.9, server: , request: "POST /_matrix/media/r0/upload?filename=IMG_6512.jpg HTTP/1.1", upstream: "http://127.0.0.1:8114/_matrix/media/r0/upload?filename=IMG_6512.jpg", host: "matrix.evulid.cc"

Which I don't think matters since Nginx sent 200 OK moments before: http finalize request: 0, "/_matrix/media/r0/upload?filename=IMG_6512.jpg" a:1, c:1

Could there be something weird going on where the client isn't cleanly closing the connection and its only causing issues with large file uploads?


Here is the end of the failed 35 MB video upload:

http body new buf t:1 f:0 00005645A6DDBD90, pos 00005645A6DDBD90, size: 3072 file: 0, size: 0
http client request body rest 30108589
http client request body recv 0
client prematurely closed connection, client: 10.0.0.9, server: , request: "POST /_matrix/media/r0/upload?filename=DCS%2520Fan%2520Boys%2520Rejoice%2520as%2520Ukraine%2520Releases%2520Fresh%2520Jet%2520Porn.mp4 HTTP/1.1", host: "matrix.evulid.cc"
http finalize request: 400, "/_matrix/media/r0/upload?filename=DCS%2520Fan%2520Boys%2520Rejoice%2520as%2520Ukraine%2520Releases%2520Fresh%2520Jet%2520Porn.mp4" a:1, c:1
http terminate request count:1
http terminate cleanup count:1 blk:0
http posted request: "/_matrix/media/r0/upload?filename=DCS%2520Fan%2520Boys%2520Rejoice%2520as%2520Ukraine%2520Releases%2520Fresh%2520Jet%2520Porn.mp4"
http terminate handler count:1
http request count:1 blk:0
http close request
http log handler
run cleanup: 00005645A6C71858
file cleanup: fd:43
free: 00005645A6DDBD90
free: 00005645A6D706B0, unused: 1
free: 00005645A6C70870, unused: 0
free: 00005645A6E3F6F0, unused: 2808
close http connection: 42
SSL_shutdown: 1
delete posted event 00005645A6C2E150
reusable connection: 0
free: 00005645A6C702B0
free: 00005645A6DE0D90, unused: 40

Vs. the end of the small image file:

http copy filter: 0 "/_matrix/media/r0/upload?filename=IMG_6512.jpg"
http finalize request: 0, "/_matrix/media/r0/upload?filename=IMG_6512.jpg" a:1, c:1
SSL_shutdown: 1
reusable connection: 1
event timer add: 44: 5000:1212692033
http lingering close handler
recv: eof:0, avail:-1
recv: fd:44 -1 of 4096
recv() not ready (11: Resource temporarily unavailable)
lingering read: -2
event timer: 44, old: 1212692033, new: 1212692033
http lingering close handler
recv: eof:1, avail:-1
recv: fd:44 31 of 4096
lingering read: 31
recv: eof:1, avail:0
recv: fd:44 -1 of 4096
recv() failed (104: Connection reset by peer) while sending to client, client: 10.0.0.9, server: , request: "POST /_matrix/media/r0/upload?filename=IMG_6512.jpg HTTP/1.1", upstream: "http://127.0.0.1:8114/_matrix/media/r0/upload?filename=IMG_6512.jpg", host: "matrix.evulid.cc"
lingering read: -1
http request count:1 blk:0
http close request
http log handler
free: 00005645A6DF56D0
free: 0000000000000000
free: 00005645A6D94710
free: 00005645A6DDBD90
free: 00005645A6D9D070, unused: 1
free: 00005645A6E1DA80, unused: 0
free: 00005645A6E3F6F0, unused: 0
free: 00005645A6C772F0, unused: 2579
close http connection: 44
SSL_shutdown: 1
event timer del: 44: 1212692033
reusable connection: 0
free: 00005645A6EEF640
free: 00005645A6E4EAB0
free: 00005645A6C65FA0, unused: 24
free: 00005645A6D79D60, unused: 400
Cyberes commented 10 months ago

@t3chguy I know its been a few months, but I'm still having this issue on version 1.94.0.

t3chguy commented 10 months ago

@t3chguy I know its been a few months, but I'm still having this issue on version 1.94.0.

That isn't an Element Web/Desktop version.

Cyberes commented 10 months ago

I'm referring to the Synapse version. Anyway, I disabled Cloudflare for matrix.evulid.cc and I was able to upload a few dozen files without any (CANCELED) error. I then switched CF back on and got the error again.

I've never encountered an issue like this with CF. Do you know of any conflicts that may be occurring between Element -> Cloudflare -> Synapse?