Open mpiraux opened 6 years ago
Before we discussed about this case, I wrote this in the cache manager
def cache_file_at_url(self, url):
""" Caches the remote file or retrieves its cached version if one exists. Returns None if an error occurs. """
filename, extension = os.path.splitext(url)
cache_asset_hash = hash(str(self.channel_id) + filename)
asset = Asset.selectBy(plugin_channel=self.channel_id, filename=filename, is_cached=True).getOne(None)
if asset is not None:
# image comparaison on
import urllib.request,web,math
original = str(url)
cache_file = ((str(web.ctx.homedomain)+ '/' + asset.path) if asset.path is not None else str(web.ctx.homedomain)+ '/cache/' + str(
asset.id))
from PIL import Image
original_file = io.BytesIO(urllib.request.urlopen(original).read())
cache_file_open = io.BytesIO(urllib.request.urlopen(cache_file).read())
image1 = Image.open(original_file)
image2 = Image.open(cache_file_open)
h1 = image1.histogram()
h2 = image2.histogram()
rms = math.sqrt(sum(map(lambda a, b: (a - b) ** 2, h1, h2)) / len(h1))
print(rms)
if not rms == 0:
asset = None
if asset is None:
if CacheManager._get_lock(cache_asset_hash, blocking=False):
try:
asset = Asset(plugin_channel=self.channel_id, filename=filename, user=None,
extension=extension if extension is not None and len(extension) > 0 else None,
is_cached=True)
self.download_manager.enqueue_asset(asset)
except (URLError, OSError):
logger.warning('Exception encountered when attempting to cache file at url %s', url, exc_info=True)
CacheManager._release_lock(cache_asset_hash)
return None
CacheManager._release_lock(cache_asset_hash)
else:
CacheManager._get_lock(cache_asset_hash, blocking=True)
CacheManager._release_lock(cache_asset_hash)
asset = Asset.selectBy(plugin_channel=self.channel_id, filename=filename, is_cached=True).getOne(None)
return asset
_name_to_lock = {}
_master_lock = Lock() # The lock of all locks
It's probably awful and works for image only. I'll change this code and work with your option!
For above example, it's not fully functional as it doesn't dereference/rereference the asset.
For above example, it's not fully functional as it doesn't dereference/rereference the asset.
It has several flaws actually. First, it assumes that assets are images, this is not true as it can be videos too. Second, it downloads the whole file at the url and then performs a comparison. There is no gain in spending CPU time to compare the files if it has already been downloaded. The real cost we want to avoid is the download of the remote file. I would expect that using HTTP caching mechanism we can avoid downloading files entirely if not needed.
Regarding #12, this piece of code will be executed each time the PluginManager
looks at a slide containing an URL. Opening the remote file and the local file (!) via HTTP(S) there will impact the slides generation latency quite heavily. I think the DownloadManager
is more suited to handle the task. It could be extended to refresh an asset using the HTTP If-Modified-Since header.
I implemented this inside the client cache_daemon
actually, have a look at these:
https://github.com/UCL-INGI/ICTV/blob/master/ictv/client/cache_daemon.py#L134
https://github.com/UCL-INGI/ICTV/blob/master/ictv/client/cache_daemon.py#L288
First part was a first inspiration and can be cancelled. For #12 , i'll adapt it with your remarks.
Currently, the caching mechanism is not updating assets in cache at any time. It should use the information exposed in HTTP headers (Last-Modified, Cache-Control, etc) to update assets in cache.
We have to define a policy for this update mechanism to trigger. Sending an HTTP request every time the
PluginManager
sees a slide referencing an external asset seems a bit heavy, but it is the policy that ensures the most consistent cache.