element-hq / matrix-content-scanner-python

A web service for scanning media hosted by a Matrix media repository
Apache License 2.0
13 stars 9 forks source link

Cache scan results #19

Closed babolivier closed 2 years ago

babolivier commented 2 years ago

Builds on top of https://github.com/matrix-org/matrix-content-scanner-python/pull/17 to cache results and contents of files in a time-based LRU cache so we don't spend our time fetching media from the homeserver.

To somewhat prevent big files from blowing cache size beyond reason, I've also added a size limit beyond which file contents aren't cached (only the results). If a file isn't cached because it's over this limit and is requested again, we just download it again, without writing it to disk or scanning it again.