element-hq / matrix-content-scanner-python

A web service for scanning media hosted by a Matrix media repository
Apache License 2.0
12 stars 8 forks source link

Use Futures to deduplicate incoming requests #37

Closed babolivier closed 1 year ago

babolivier commented 1 year ago

Fixes #23

As of #19, we cache (most) scan results in memory to avoid having to download and scan the same file twice (with the caveat that we don't cache the contents of big files, but it still saves scanning them). However, it does not prevent duplicate work if multiple requests for the same file happen simultaneously, since we only update this cache when a scan completes. This is a legitimate concern, since we can easily imagine a situation where a user posts a large file into a room and multiple users try to download it at the same time.

As of this change,

babolivier commented 1 year ago

So even though the test is passing, it's currently not working when used in manual testing due to Twisted messing up with things, resulting in:

Traceback (most recent call last):
  File "/home/babolivier/Documents/matrix/mcs-python-final/matrix_content_scanner/servlets/__init__.py", line 78, in _async_render
    code, response = await method_handler(request)
  File "/home/babolivier/Documents/matrix/mcs-python-final/matrix_content_scanner/servlets/scan.py", line 44, in on_GET
    await self._scanner.scan_file(media_path)
  File "/home/babolivier/Documents/matrix/mcs-python-final/matrix_content_scanner/scanner/scanner.py", line 164, in scan_file
    return await self._current_scans[cache_key]
RuntimeError: await wasn't used with future

After discussing in #synapse-dev, we came to the conclusion that it would probably be best to just get rid of Twisted altogether and use a more asyncio-friendly web framework (since we exclusively use Twisted as such) like aiohttp instead. This PR is therefore on hold till this change has happened.

babolivier commented 1 year ago

I've run some manual testing of this branch with the changes from #39, and can confirm it works now that Twisted is out of the equation!