Open CJ-Wright opened 7 years ago
Potential strategy:
SafeDict
class (see here) to allow us to format with impunity (just keep trying to squish data in, its fine if we fail since it will just give back the unformatted chunk)defaultdict
of only str
, then call to format_map
. This will format everything else as ''
__
which were created in the removal processPathlib
example:
from collections import defaultdict
import re
from pathlib import Path
class SafeDict(dict):
def __missing__(self, key):
return '{' + key + '}'
a = '{sample_name}/{folder_tag}/{analysis_stage}/{human_timestamp}_' \
'{auxiliary}_{ext}_{ext2}.txt'
b = a.format_map(SafeDict(sample_name='hi'))
c = b.format_map(SafeDict(analysis_stage='raw', human_timestamp='time',
auxiliary='diff_x=.1'))
d = c.format_map(defaultdict(str))
print(d)
pattern_after_substitution= re.sub(r"\_\_+", "_", d)
e = pattern_after_substitution.replace('_.', '.')
print(e)
f = Path(e).as_posix()
print(f)
hi//raw/time_diff_x=.1__.txt
hi//raw/time_diff_x=.1.txt
hi/raw/time_diff_x=.1.txt
Edit: better regex (this is what happens when you don't go with the highest voted result on stack overflow)
Thanks for finding all these links. I think custom formatted in PEP 3101 looks like the most promising solution.
I think the plan going forward (which is somewhat agnostic to how we actually do the subs/cleaning) is:
map
and lambda s, **x: s.format_map(SafeDict(**x))
Now per analyzed data output type (tiff, dark_sub tiff, mask, iq, gr, etc.)
map
and lambda s, **x: s.format_map(SafeDict(**x))
map
and a cleaning functionzip
and map
to write to file with standard saving functions (save_output, fit2d_save, imsave, etc.)Side note, on the cleaning side we may want to replace unrendered string chunks with zonk
or some other default string. This way we can look for anything which fits the pattern _bla=zonk_
and replace it with _
. We can then replace any remaining zonk
(like in the filename) with ''
and then continue on our normal __
and path based cleaning.
Here are two classes of note, the first partially renders the template, the other replaces all un-renderable sections with ''
class PartialFormatter(string.Formatter):
def get_field(self, field_name, args, kwargs):
# Handle a key not found
try:
val = super(PartialFormatter, self).get_field(field_name, args,
kwargs)
# Python 3, 'super().get_field(field_name, args, kwargs)' works
except (KeyError, AttributeError):
val = '{' + field_name + '}', field_name
return val
def format_field(self, value, spec):
# handle an invalid format
if value is None:
return spec
try:
return super(PartialFormatter, self).format_field(value, spec)
except ValueError:
return value[:-1] + ':' + spec + value[-1]
class PartialFormatterCleaner(string.Formatter):
def get_field(self, field_name, args, kwargs):
# Handle a key not found
try:
val = super(PartialFormatterCleaner, self).get_field(field_name,
args,
kwargs)
# Python 3, 'super().get_field(field_name, args, kwargs)' works
except (KeyError, AttributeError):
val = '', field_name
return val
def format_field(self, value, spec):
# handle an invalid format
if value is None:
return spec
try:
return super(PartialFormatterCleaner, self).format_field(value,
spec)
except ValueError:
return ''
The basic idea is to do a bunch of formatting, then use the second class with defaultdict(str)
to clear out the un-rendered sections. Then we just need to clean up any other parts and we're done.
Nice. I was literally just working on this to try to squeeze it into this release for you. I have some opinions (bounced off of Tom). I'll work off of this and put up a PR for you to review.
Cool
You may find https://github.com/xpdAcq/xpdAn/pull/107/files#diff-3bdf813d65d2df127ea85da3dc3c3364 helpful. Specifically the template, and its janky cleanup
What is the best method for including keys into export templates when those keys might not exist? eg
I presume that the second RE call will cause a crash as format won't know what to do with the missing
folder_tag
.I can imagine users wanting this kind of functionality, where certain tags are used to denote deeper files, but the tags aren't always there.
Potential solution: Scrape the template for the templated areas, extract the needed keys then run
doc.get(key, '')
to get the values. Finally launder everything through pathlib to remove any//
issues. Maybe we should move to a more expressive template rendering library? (jinja2?) This way we could also have more expressive for loops, eg for unknown amounts of auxiliary data.