Closed jferard closed 4 years ago
This is fundamental question, how to structure the data. I used dictionaries because in my opinion it offers the best balance between 'script interoperability' (we can export a dictionary to a file using native functions) and flexibility. And also it is easy and quick to code.
I full agree that the use of class is 'cleaner' but it requires a little more time to implement. You can propose a prototype if you feel like! :) Concerning the structure of the class of your last example:
evolution = data.evolution_single(area, field)
I think it is a good idea to have the method "evolution" that outputs a list or a np.array for a given area. However, I think it might be tricky to define a method for "smoothing" for example : the smoothing operator should be transparent to whatever it is applied to: a single region (Guadeloupe, France), and single country (France, France) or a group of countries (EU). Therefore i think the easiest is to have a smoothing function taking a list (or np.array) and as input. Hence, I envision the tools to process the data to be outside of the data class. What do you think?
Maybe the real issue is not dict vs class, but there is definitely a code smell here. One expects values of a dict to be homogeneous, but that's not the case here: xParam are catch-all. An example: dataParam['Confirmed']
and dataParam['EvolutionType']
. The first is a dataframe, that is real data taken from the source (John Hopkins) while the second is a parameter. They are syntactically different (dataframe vs string) and, more annoying, semantically different (data vs parameter). (Note: using a class forces you to think of semantics from the beginning, that why it is generally safer than a dict.)
To narrow the issue, I think we have to:
evolution_type
does not need to be in a dict).dataParam
, fitParam
, displayParam
the right decomposition?)I can only agree all what you say.
To your:
dataParam
itself is not consistent! evolution_type
does not need to be in a dict, but this is convenient to encapsulate everything into a dict to give as a parameter or to save in a file.I agree with 3.: it seems the more consistent design. But I think we may use an empiric strategy: first create independent variables, and then see if we need one, two, or three classes/dicts. That's a way to stick to the KISS principle. But the other way is possible: start with a class/dict and split it if necessary.
About 2. This is the classic refactoring: https://refactoring.com/catalog/introduceParameterObject.html. More details here: https://blog.novoda.com/refactoring-introduce-parameter-object/. The principle of all refactorings is: don't do it until you need it. In Python, you can limit the number of arguments by setting default values and it's often sufficient:
>>> def f(a, b, c=1, d=10, e=100, f=1000, z=10000):
... pass
>>> f(0, 0, z=-1)
Saving seems a legitimate concern to me, but not a real issue:
>>> class A:
... def __init__(self, a, b): self.a = a; self.b = b
...
>>> x = A(1, "a")
>>> x.a, x.b
(1, 'a')
>>> x.__dict__
{'a': 1, 'b': 'a'}
>>> import json
>>> d = json.dumps(x.__dict__)
>>> d
'{"a": 1, "b": "a"}'
>>> y = A(**json.loads(d))
>>> y.a, y.b
(1, 'a')
I'm working on it.
I would have used a
Data
(Display
,Fit
, ...) class (or a dataclass with Python 3.7) to avoid typos and give a precise semantic to each parameter. In my mind, this is less flexible, but cleaner.A design example:
And elsewhere:
Or better, with the evolution logic into
Data
: