Open yarikoptic opened 2 years ago
@yarikoptic I'm not entirely clear on the behavior you're describing. Do you mean that a thus-decorated function should detect copies (How?) and memoize them as though they were the original path?
I was thinking about something like if we have
@cache.memoize_path
def decorated_func(path, ...):
.... whatever ...
and decided to copy file from src_path
to dest_path
(well, could also be "move" instead of "copy"), we could do
copy(src_path, dest_path)
cache.memoized_path_copy(decorated_func, src_path, dest_path)
which would then copy all memoized/cached invocations for the decorated_func for the src_path
so they would also be known for dest_path
@yarikoptic This might be possible depending on the underlying functionalities of joblib; I've brought this possibility up in a related issue there.
not sure if we would see desired development in joblib done/accepted in the nearest future... may be only if we send a PR for some alternative (probably based on FileSystemStoreBackend) backend which would provide desired interfaces/functionality. Meanwhile tried already existing interface to get information about all entries in the cache:
(dandisets) dandi@drogon:/mnt/backup/dandi/dandisets$ time python3 -c 'from dandi.support.digests import checksums; c = checksums._memory.store_backend.get_items(); print(len(c)); print(c[0]);'
55341
CacheItemInfo(path='/home/dandi/.cache/fscacher/dandi-checksums/joblib/dandi/support/digests/get_dandietag/75ce6b526d6e61faac02b4164ac645c5', size=641, last_access=datetime.datetime(2021, 6, 29, 21, 55, 31, 925287))
real 0m3.325s
user 0m2.399s
sys 0m1.141s
and that was a "warm" run, original one was probably twice longer. But it is on drogon which "saw too much" (over 50k entries) and for a typical user, and probably having mv
not that common -- this should be ok. So we can identify cache entries associated with a path easily and through an existing interface. The question would be either it would be possible to copy
them into a new entry (with adjusted path and last_access)?
@yarikoptic Copying modified entries depends on too many implementation details of joblib.Memory
which, at best, are managed via functions with no public documentation whose names start with underscores. If we want to be able to do this reliably, we need cooperation with joblib; see https://github.com/joblib/joblib/issues/1237 or start a new issue.
In the light of https://github.com/dandi/dandi-cli/issues/848 discussion to allow for more efficient caching of digests, I wondered if it would be feasible to provide something like
memoized_path_copy
which would copy all (?) memoized invocations for a specific decorated function as they were invoked for another "new" path.ATM, looking at the code, and since we rely on joblib memoization and otherwise do not track what specific parametrizations of the function were used, I really do not see how we could even do that. But may be you @jwodder see some way to provide such functionality?