con / fscacher

Caching results of operations on heavy file trees
MIT License
0 stars 2 forks source link

Support decorating instance methods of classes with path attributes #58

Open jwodder opened 2 years ago

jwodder commented 2 years ago

Prompted by https://github.com/dandi/dandi-cli/issues/852#issuecomment-996210566

yarikoptic commented 2 years ago

answering question to https://github.com/con/fscacher/issues/44#issuecomment-996833473 (which is now closed) here

@yarikoptic So what should #58 do about resolving path attributes when decorating methods?

I am thinking about something along the lines of abstracting away "fingerprinters" so we could generalize to not only path and not only the first arg of the function, but also to attributes etc. Inspection of signature to figure out which arguments we need to deal with is to be only once so it should not have notable performance hit IMHO.

  1. Define base class Fingerprinter which will be subclassed for File and Directory and where to _get_file_fingerprint and _get_dir_fingerprint would fold into along with the modified_in_window to be called at the end of them to return None if "too early"
  2. PathFingerprinter adapter will switch between file and dir fingerprinter depending on what the value (path) is (directory or not)
  3. generalize memoize_path into memoize_params (or any other name you like) which would also take an option (so more of functions nesting, uff) e.g. params: dict[str, Fingerprinter] to specify which fingerprinter to use for the args needing special handling. And add ability to specify access to class instance attributes
    params={
      "path": PathFingerprinter,
      "self.path": PathFingerprinter,  # here due to `self.` take corresponding attribute of the bound instance 
    }

    edit 1: or make it more explicit by separate two options: args and params of the same kind.

  4. arguments (not self. attributes) which went through "transformation" will be ignored in the calls to joblib like we did for the path. Their Fingerprinter should provide desired value as part of the returned fingerprint (e.g. resolved path as you @jwodder did in #45 IIRC)
jwodder commented 2 years ago

@yarikoptic

yarikoptic commented 2 years ago
jwodder commented 2 years ago

@yarikoptic

no, we will ignore original path as we do now

How? The instance needs to be passed to the instance method, and if we tell joblib to ignore "self", then all non-path attributes will be ignored by joblib as well.

yarikoptic commented 2 years ago

Ah, good point. Let me think about it.

yarikoptic commented 2 years ago

Aha, I think it is manageable although fragile: we can also add argument to list attributes to be added to the fingerprint (if needed). Sure we wouldn't capture full state but I expect it to be applied cautiously ;-)