Open matrixbot opened 11 months ago
Hi there, just chiming in to follow progress here. Thanks!
I don't want the client to time-out, this should be a client-side issue.
Does element on android and ios have these timeouts as well, or just element web?
@towards-a-new-leftypol Thank you for the note. I am not sure, as I do not use the Android or iOS implementations, but I do know that the issue definitely manifests with the web and desktop app, yes. Perhaps someone else knowledgeable on those platforms can chime in to assist?
Hi there, just chiming in to follow progress here. Thanks!
I am the earliest initiator of this issue. I recently replaced the hard disk (SSD, connected via SATA2 interface), which improved this problem to a certain extent. According to my long-term observation, the following reasons may cause large file upload failures:
homeserver.yaml
. The reason is that the data type of the column named media_length
in the table local_media_repository
in the database has a limitation (integer). The maximum value allowed by this type is 2147483647. This column records the size of each uploaded media, which is approximately equal to 2 GB. The first version I used was 1.49.0, and then it was slowly upgraded to 1.116.0. As of now, the database is still like this;client_max_body_size
must be equal to the allowed upload size you configured, otherwise the upload will fail (HTTP status code 413).In order to test whether the database is the highest indicator for limiting the upload of large files, I bypassed the client limit and used Shell scripts and curl to test upload large files to Synapse. Finally, I found that the size it can accept is 2 GB anyway.
File info:
en-us_windows_10_iot_enterprise_ltsc_2021_x64_dvd_257ad90f.iso
4.52 GB
Config: homeserver.yaml
max_upload_size: 8192M
Method:
Follow Synapse's API specification
curl -X POST \
-H "Authorization: Bearer $access_token" \
-H "Content-Type: application/octet-stream" \
-F "file_name=@$filepath/$filename" \
$synapse_url/_matrix/media/v3/upload?filename=$filename
Response:
{"errcode":"M_UNKNOWN","error":"Internal server error"}
Log:
synapse.media.media_repository - 347 - INFO - POST-0 - Stored local media in file '/data/media_store/local_content/uE/fT/xblFXOfAefodsIjZLwcJ'
synapse.storage.txn - 760 - DEBUG - POST-0 - [TXN START] {store_local_media-13f}
synapse.storage.txn - 856 - DEBUG - POST-0 - [TXN FAIL] {store_local_media-13f} integer out of range
synapse.storage.txn - 864 - DEBUG - POST-0 - [TXN END] {store_local_media-13f} 0.013079 sec
synapse.http.server - 146 - ERROR - POST-0 - Failed handle request via 'UploadServlet': <XForwardedForRequest at 0x7fbdccdc35d0 method='POST' uri='/_matrix/media/v3/upload?filename=en-us_windows_10_iot_enterprise_ltsc_2021_x64_dvd_257ad90f.iso' clientproto='HTTP/1.1' site='8008'>
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/synapse/http/server.py", line 332, in _async_render_wrapper
callback_return = await self._async_render(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/synapse/http/server.py", line 544, in _async_render
callback_return = await raw_callback_return
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/synapse/rest/media/upload_resource.py", line 111, in on_POST
content_uri = await self.media_repo.create_content(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/synapse/logging/opentracing.py", line 922, in _wrapper
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/synapse/media/media_repository.py", line 349, in create_content
await self.store.store_local_media(
File "/usr/local/lib/python3.11/site-packages/synapse/logging/opentracing.py", line 922, in _wrapper
return await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/synapse/storage/databases/main/media_repository.py", line 458, in store_local_media
await self.db_pool.simple_insert(
File "/usr/local/lib/python3.11/site-packages/synapse/storage/database.py", line 1084, in simple_insert
await self.runInteraction(desc, self.simple_insert_txn, table, values)
File "/usr/local/lib/python3.11/site-packages/synapse/storage/database.py", line 952, in runInteraction
return await delay_cancellation(_runInteraction())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/twisted/internet/defer.py", line 2010, in _inlineCallbacks
result = context.run(
^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/twisted/python/failure.py", line 549, in throwExceptionIntoGenerator
return g.throw(self.value.with_traceback(self.tb))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/synapse/storage/database.py", line 918, in _runInteraction
result: R = await self.runWithConnection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/synapse/storage/database.py", line 1047, in runWithConnection
return await make_deferred_yieldable(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/twisted/python/threadpool.py", line 269, in inContext
result = inContext.theWork() # type: ignore[attr-defined]
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/twisted/python/threadpool.py", line 285, in <lambda>
inContext.theWork = lambda: context.call( # type: ignore[attr-defined]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/twisted/python/context.py", line 117, in callWithContext
return self.currentContext().callWithContext(ctx, func, *args, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/twisted/python/context.py", line 82, in callWithContext
return func(*args, **kw)
^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/twisted/enterprise/adbapi.py", line 282, in _runWithConnection
result = func(conn, *args, **kw)
^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/synapse/storage/database.py", line 1040, in inner_func
return func(db_conn, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/synapse/storage/database.py", line 780, in new_transaction
r = func(cursor, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/synapse/storage/database.py", line 1098, in simple_insert_txn
txn.execute(sql, vals)
File "/usr/local/lib/python3.11/site-packages/synapse/storage/database.py", line 426, in execute
self._do_execute(self.txn.execute, sql, parameters)
File "/usr/local/lib/python3.11/site-packages/synapse/storage/database.py", line 488, in _do_execute
return func(sql, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
psycopg2.errors.NumericValueOutOfRange: integer out of range
synapse.access.http.8008 - 473 - INFO - POST-0 - 192.168.3.1 - 8008 - {@**********************************} Processed request: 63.217sec/0.000sec (0.810sec, 43.827sec) (0.000sec/0.013sec/1) 55B 500 "POST /_matrix/media/v3/upload?filename=en-us_windows_10_iot_enterprise_ltsc_2021_x64_dvd_257ad90f.iso HTTP/1.1" "curl/7.54.0" [0 dbevts]
This issue has been migrated from #13009.
This issue is a distilled version of #12937.
When a file is uploaded to Synapse's media store, Twisted buffers it into a temporary file and gives Synapse an
IO
handle to it once it's been fully received. The issue is that, with sufficiently large files and slow storage, it's possible that it will take more than 30 seconds to copy the bytes out of that temporary file into the correct location in the media store.Clients such as Element Web will time out 30 seconds after having sent the last byte. (It also wouldn't seem unusual for a reverse proxy to do something similar.) When this happens, the disconnection causes Twisted to close the temporary file and so Synapse's copy is interrupted.
The error produced in the logs looks as follows:
We would benefit overall by writing the bytes straight into a file on the correct filesystem as they're received from the client (either to the correct location, or to a temporary location with a rename operation upon completion). This would also mean that the client doesn't have to 'wait around' for the file copy to happen, since it's been happening during the upload (and in most cases, disk I/O is probably faster than the client's upload speed anyway!). In short: it will reduce upload latency a bit for large files.
(An uglier workaround would be to prevent the file copy from blocking the response, returning the MXC URL before it's even ready and finishing the copy in the background. I don't think it's great to give an MXC URL that's not ready — it could lead to races.)
Either solution probably involves some amount of spelunking into how Twisted manages these temporary files.