Closed 7yl4r closed 5 years ago
another example:
'filepath': '/srv/imars-objects/airflow_tmp/processing_modis_aqua_pass_gom_20180803T190000_l2_file'
'load_format': '{dag_id}_%Y%m%dT%H%M%S_{tag}'
parsed from filename:
{'dt_d': 3, 'dt_H': 1900, 'dt_Y': 201808, 'dt_m': 0, 'dag_id': 'processing_modis_aqua_pass_gom', 'dt_S': 0, 'tag': 'l2_file', 'dt_M': 0}
Debug output that looks suspicious:
parse: DEBUG: format
'wv2_{dt_Y:4d}_{dt_m:2d}_{dt_d:2d}T{dt_H:2d}{dt_M:2d}{dt_S:2d}_{area_short_name}_{order_id:9d}_10_0.zip'
->
'wv2_ *(?P<dt_Y>[-+ ]?\\d+|0[xX][0-9a-fA-F]+|\\d+|0[bB][01]+|0[oO][0-7]+)_ *(?P<dt_m>[-+ ]?\\d+|0[xX][0-9a-fA-F]+|\\d+|0[bB][01]+|0[oO][0-7]+)_ *(?P<dt_d>[-+ ]?\\d+|0[xX][0-9a-fA-F]+|\\d+|0[bB][01]+|0[oO][0-7]+)T *(?P<dt_H>[-+ ]?\\d+|0[xX][0-9a-fA-F]+|\\d+|0[bB][01]+|0[oO][0-7]+) *(?P<dt_M>[-+ ]?\\d+|0[xX][0-9a-fA-F]+|\\d+|0[bB][01]+|0[oO][0-7]+) *(?P<dt_S>[-+ ]?\\d+|0[xX][0-9a-fA-F]+|\\d+|0[bB][01]+|0[oO][0-7]+)_(?P<area_short_name>.+?)_ *(?P<order_id>[-+ ]?\\d+|0[xX][0-9a-fA-F]+|\\d+|0[bB][01]+|0[oO][0-7]+)_10_0\\.zip'
Here's a maximally more reproducible example:
from parse import parse
fmt_str = "W_{var_Y:4d}{var_m:2d}_000.xml"
in_str = "W_201301_000.xml"
parse(fmt_str, in_str)
<Result () {'var_Y': 20130, 'var_m': 1}>
Fixed by updating to parse 1.9+
I am trying to rework the metadata merging so it all happens at once and have identified several cases of metadata being improperly parsed from filepaths. In some cases I might be able to fix this by simply modifying
_STRFTIME_MAP
, but in others (see first example below) it looks like theparse
package may not be paying much attention to thewidth
part of the format string - ie : it is reading 6 digits when explicitly told to look for 4.