dchackett / taxi

Lightweight portable workflow management system for MCMC applications
MIT License
3 stars 1 forks source link

Modularize file naming conventions #21

Closed dchackett closed 6 years ago

dchackett commented 7 years ago

Currently, file naming conventions are hard-coded in to taxi in tasks.py and dispatch_tools.py (although in such a way that they can be relatively easily modified, if you know which functions in which objects to override). This is obviously undesirable. We need some sort of modularized file-naming convention mechanism. I am imagining some sort of object with parse_filename(filename) and generate_filename(...) functions.

If parse_filename has some sort of well-defined failure mode (like returning None), multiple filename conventions can be plugged in simultaneously (in an input context, like when FileSpectroJob guesses parameters from gauge filenames).

The arguments that generate_filename(...) takes are obviously application-dependent. This is similar to the issue encountered when trying to think of what parameters a general GaugeGenerator class should be aware of. Might be best to leave as **kwargs, maybe with some kind of inheritable "Warning: argument ... was ignored" functionality.

dchackett commented 6 years ago

Modularized file naming conventions are now implemented in the modularize branch, but the solution is ungainly. Leaving this issue open for further iteration on this idea.

dchackett commented 6 years ago

Current implementation is horrible. Instead:

  1. For each file, provide a (list of) Python format strings including datatype, e.g.
    HMCTask.gaugefile_conventions = ["cfg_{Ns:d}_{Nt:d}_{beta:f}_{kappa:f}_{label:s}_{traj:d}"]
  2. When generating a filename, simply plug in using format like saveg = self.gaugefile_conventions[0].format(**self.to_dict()) The method(s) that implement this should be separately overrideable for each filename. If a list of convention strings is provided, always use first in list for writing a filename (first item is for output, all items are for input).
  3. When reading a filename, use the format strings as regexes (There has to be a module that does this...) to try to parse a filename. Try each one in the sequence provided, like
    try:
        params = parse_with_format_str_as_regex(filename, regex=fnc)
        break
    except RegexError:
        continue

    The function parse_with_format_str_as_regex casts parsed-out strings to the appropriate datatypes. To handle directory structure, make sure the regex tries to match with the end of the string, and allow for slashes like "/{Ns:d}x{Nt:d}/{beta:f}/{k4:f}_{k6:f}/cfg_{traj:d}".

Remaining issue: most graceful to use the same gaugefile_conventions for both loadg and saveg. Is there some graceful way to associate different filename attributes with the same convention? On the other hand, providing conventions separately for loadg and saveg is not terrible.

etneil commented 6 years ago

Going to leave this link here:

https://pypi.python.org/pypi/parse

dchackett commented 6 years ago

Implemented the proposed new file naming convention scheme using parse Closing this issue, will reopen if the new scheme ends up not working out.