geoffroychaussonnet / script_to_monitor_Covid19

Python scripts to monitor Covid-19
BSD 3-Clause "New" or "Revised" License
1 stars 1 forks source link

Why are xParam dicts? #9

Closed jferard closed 4 years ago

jferard commented 4 years ago

I would have used a Data (Display, Fit, ...) class (or a dataclass with Python 3.7) to avoid typos and give a precise semantic to each parameter. In my mind, this is less flexible, but cleaner.

A design example:

 data_factory = DataFactory(path)
 data = data_factory.create(field, evolution_type, smoothing, start_date)
 ...

And elsewhere:

 def evolution_country(area, data, display):
      ...
     if field=="Confirmed":
          evolution = evolution_single(area, data.confirmed)

Or better, with the evolution logic into Data:

 def evolution_country(area, data, display):
      ...
      evolution = data.evolution_single(area, field)
geoffroychaussonnet commented 4 years ago

This is fundamental question, how to structure the data. I used dictionaries because in my opinion it offers the best balance between 'script interoperability' (we can export a dictionary to a file using native functions) and flexibility. And also it is easy and quick to code.

I full agree that the use of class is 'cleaner' but it requires a little more time to implement. You can propose a prototype if you feel like! :) Concerning the structure of the class of your last example:

evolution = data.evolution_single(area, field)

I think it is a good idea to have the method "evolution" that outputs a list or a np.array for a given area. However, I think it might be tricky to define a method for "smoothing" for example : the smoothing operator should be transparent to whatever it is applied to: a single region (Guadeloupe, France), and single country (France, France) or a group of countries (EU). Therefore i think the easiest is to have a smoothing function taking a list (or np.array) and as input. Hence, I envision the tools to process the data to be outside of the data class. What do you think?

jferard commented 4 years ago

Maybe the real issue is not dict vs class, but there is definitely a code smell here. One expects values of a dict to be homogeneous, but that's not the case here: xParam are catch-all. An example: dataParam['Confirmed'] and dataParam['EvolutionType']. The first is a dataframe, that is real data taken from the source (John Hopkins) while the second is a parameter. They are syntactically different (dataframe vs string) and, more annoying, semantically different (data vs parameter). (Note: using a class forces you to think of semantics from the beginning, that why it is generally safer than a dict.) To narrow the issue, I think we have to:

  1. separate data from parameters
  2. make parameters plain vars when possible (evolution_type does not need to be in a dict).
  3. group some parameters (is dataParam, fitParam, displayParam the right decomposition?)
geoffroychaussonnet commented 4 years ago

I can only agree all what you say.
To your:

  1. that makes sense. Actually the variable name dataParam itself is not consistent!
  2. indeed evolution_type does not need to be in a dict, but this is convenient to encapsulate everything into a dict to give as a parameter or to save in a file.
  3. We can think about something else. Maybe, one class (or dict) for the data and one class (or dict) for all parameters together?
jferard commented 4 years ago

I agree with 3.: it seems the more consistent design. But I think we may use an empiric strategy: first create independent variables, and then see if we need one, two, or three classes/dicts. That's a way to stick to the KISS principle. But the other way is possible: start with a class/dict and split it if necessary.

About 2. This is the classic refactoring: https://refactoring.com/catalog/introduceParameterObject.html. More details here: https://blog.novoda.com/refactoring-introduce-parameter-object/. The principle of all refactorings is: don't do it until you need it. In Python, you can limit the number of arguments by setting default values and it's often sufficient:

>>> def f(a, b, c=1, d=10, e=100, f=1000, z=10000):
...         pass
>>> f(0, 0, z=-1)

Saving seems a legitimate concern to me, but not a real issue:

>>> class A:
...     def __init__(self, a, b): self.a = a; self.b = b
...
>>> x = A(1, "a")
>>> x.a, x.b
(1, 'a')
>>> x.__dict__
{'a': 1, 'b': 'a'}
>>> import json
>>> d = json.dumps(x.__dict__)
>>> d
'{"a": 1, "b": "a"}'
>>> y = A(**json.loads(d))
>>> y.a, y.b
(1, 'a')
jferard commented 4 years ago

I'm working on it.

jferard commented 4 years ago

See https://github.com/geoffroychaussonnet/script_to_monitor_Covid19/pull/15