Open LukeJones123 opened 2 years ago
I have the same problem...
For example, my VNAMEs look like "Ozone (O3); volume mixing ratio; [ppb]"
- so the pattern would be
name; unit description; unit, or r'(?P<name>.*)\;\ (?P<description>.*)\;\ (?P<unit>.*)'
as a regex in Python.
However the default pattern is
myfile.var_and_units_pattern
Out[32]: re.compile(r'^\s*(.*)\((.+?)\)(.*)\s*$', re.UNICODE)
function _attemptVarAndUnitsMatch seems to do the pattern matching (name, unit, description) - but how to set the pattern correctly? Could we extend the NACore class by a setter method for the var_and_units_pattern
attribute?
@FObersteiner, I have a branch that allows you to supply your own Python function to separate the names from the units: https://github.com/LukeJones123/nappy/tree/feature-var_units_parsing. Hopefully Ag will release a new version with this merged in soon.
@LukeJones123 out of curiosity, why did you implement it as a callback? Wouldn't it be more straight-forward to add a setter method for the class attribute, so the user can provide a custom regex before actually reading the file? Also, your callback func resolves to two variables instead of the default triplet v1, unit, v2. What if I want to resolve vname to three (or more) components?
A custom regex was the obvious first choice but unfortunately I find the strings are so heterogeneously formatted that I need to implement some logic to catch all cases, hence a callback function. A custom regex would be a good additional option to have, for those with simpler needs. Of course, the best option would be to simply stop using Nasa Ames - I hate this format for exactly these reasons!
@LukeJones123 let me propose a way that does not require to modify the API but satisfies your needs nevertheless!
1- initialize var_and_units_pattern as an underscored class attribute:
def __init__(self):
# ...
self._var_and_units_pattern = re.compile(r"^\s*(.*)\((.+?)\)(.*)\s*$")
2- make it a property with a setter:
@property
def var_and_units_pattern(self):
"""
a regex or function to parse vnames to (name & description, unit)
"""
return self._var_and_units_pattern
@var_and_units_pattern.setter
def var_and_units_pattern(self, new_pattern):
if not isinstance(new_pattern, (re.Pattern, types.FunctionType)):
raise TypeError(f"new pattern must be re.Pattern or function, "
f"not {type(new_pattern)}")
self._var_and_units_pattern = new_pattern
3- include the callback function option in _attemptVarAndUnitsMatch:
def _attemptVarAndUnitsMatch(self, item):
if isinstance(self._var_and_units_pattern, types.FunctionType):
return self._var_and_units_pattern(item)
match = self._var_and_units_pattern.match(item)
if match:
# ...
---> now you can easily set your custom parser or regex after loading the file:
f.var_and_units_pattern = custom_parser
--> A working implementation in my fork
Hi Florian,
A problem with your suggestion is that a callback function is not the same thing as a regular expression. Having an attribute called var_and_units_pattern that is sometimes set to a regular expression and sometimes set to a function is a bit confusing. Since they are such different things I think a callback function should have a separate attribute.
Your suggestion would be good as a way to set the pattern to a non-default value, and being able to provide a non-default pattern would also be a good thing, but that should be a separate development I think.
Luke.
@LukeJones123 you're right, the attribute name would be misleading if it can be a function.
custom_vname_parser
a settable attribute as well, and the preferred option to parse variables in _attemptVarAndUnitsMatch
, if it is not None
?Am I correct that what you're suggesting is, instead of passing the custom parser function as an input to the NAFile init() method, you first create the NAFile object and then use a setter to set the attribute? The trouble with that is it means the init() method will never be able to make any attempt to read the file, because it must allow for the caller to change attributes relevant to file parsing even after object initialisation. I think, at the moment, the file header is actually read by init(). Even if it wasn't, under the current API it could be changed to do so in future. Allowing the setting of attributes required for file parsing after initialisation will prevent that forever (unless you implement a backwards incompatible API change). In that respect, this attribute is similar to the ignore_header_lines argument, which is also passed to init.
I know the way I've done it looks a bit clunky but I think it's the best way.
@LukeJones123 why shouldn't you be able to set an attribute (with a setter) upon initialization? Simple example:
class myClass:
def __init__(self, **kwargs):
self.a_string = kwargs['the_string'] if "the_string" in kwargs else None
@property
def a_string(self):
return self._a_string
@a_string.setter
def a_string(self, value):
self._a_string = value
c = myClass()
print(c.a_string)
# None
c_with_string = myClass(the_string="hello")
print(c_with_string.a_string)
# hello
I think the **kwargs
approach matches well with what you need, no?
The advantage would be that we could also add deleter; meaning that you can change or remove the custom parser func after initialization of the na file class instance.
cheers, Florian
In your example the setter serves no purpose if modifying self._a_string after initialisation is too late to have the desired effect, which is the case for this callback function, so this does not match with the use case in question. There is also no sensible use case for deleting the callback function at a later time. Can we please just agree on the current solution?
Again, I don't think this is true. Anyways, what about Ag merges your PR and I'll have a look if I can make it look a bit less chunky afterwards - without breaking the functionality you need of course.
@LukeJones123 @agstephens I think this can be closed, no?
For me the issue will be closed when the changes are released in the next version - 2.0.2.
Description
Some of the files in the NDACC database do not have variable name+units strings that conform to the expectations of Nappy. For example, some providers separate the two with a semicolon, e.g.
This issue could be addressed by a more complicate regular expression, but others are more convoluted and regular expressions aren't enough. Thus would be good if a custom user callback function could be allowed to do the parsing.
What I Did