Refactor `dict_deep_update` function.

catalystneuro / nwb-conversion-tools

Create NWB files by converting and combining neural data in proprietary formats and adding essential metadata.

BSD 3-Clause "New" or "Revised" License

25 stars 12 forks source link

While working on PR #481 me and @CodyCBakerPhD were discussing about the poor variable name in this very important function. Right now, we have:

https://github.com/catalystneuro/nwb-conversion-tools/blob/5e66722723420856de70e80b8e465f3bb20ebfbd/src/nwb_conversion_tools/utils/dict.py#L121-L124

Which go against the design principle that variable names should be as meaningful as possible so they work as documentation themselves.

Right now PR #481 has modified the internal names of the function but we are leaving the keywords arguments untouched as that might break some script that rely on this. I am writing this issue as a record of intent for modifying this in the future, propose some break down of the tasks to do, write some other concerns that I have and general discussion.

Proposal for to-do:

[ ] Do a PR changing the keyword arguments but also adding logic so the previous keyword arguments still work but throw a deprecation warning.
[ ] The auxiliary function append_replace_dict_in_list suffers from the sames issues and should be refactored as well.
[ ] Moreover, the append_replace_dict_in_list seems to be actually doing more than one thing, so it should be break down for the different actions that it actually performs.
[ ] Modify the unit_tests so they reflect some use cases with metadata or source_schemas so they work better as functional documentation.

I am also in general concerned about all the functionality that we are adding to the function. We have many parameters that change the behavior of the function: 1) list_dict_deep_update (A boolean) for example, controls whether lists are appended/extended or just overwritten. 2) compare_key (a string that equals "name") controls where properties in our metadata have to match so they are updated.

I checked in our library and as far as I could find we don't really use the function without the defaults. My impression is that we are keeping this for backwards compatibility. Given that the inclusion of this parameters account for a large chunk of the complexity of this function and makes a central part of our code base more difficult to understand and modify I think we should also start thinking on dropping them (with the usual warning period of course).

Am I missing something?

Task list sounds fine. Deprecation to removal should probably be on a 6-month schedule from time of merge.

Moreover, the append_replace_dict_in_list seems to be actually doing more than one thing, so it should be break down for the different actions that it actually performs.

I'd do that in a separate PR

Modify the unit_tests so they reflect some use cases with metadata or source_schemas so they work better as functional documentation.

Never a bad thing to have more explicit tests.

I checked in our library and as far as I could find we don't really use the function without the defaults. My impression is that we are keeping this for backwards compatibility. Given that the inclusion of this parameters account for a large chunk of the complexity of this function and makes a central part of our code base more difficult to understand and modify I think we should also start thinking on dropping them (with the usual warning period of course).

Am I missing something?

I know it's a complicated function, but it's also a very important one that has undergone a lot of work over the past years to allow it to handle all that extra functionality. I don't think we should reduce the amount of things it's capable of, but we can certainly adjust any internal code to be more understandable.

The current state of the function does all the things that have been needed in the past, which is not specific to our local usage within NWBCT. In fact, most of the changes have been needed by several past conversion projects in order to properly update their specific, highly unique structure. Unit tests in https://github.com/catalystneuro/nwb-conversion-tools/blob/main/tests/test_internals/test_json_schema_utils.py were usually updated with minimal examples showcasing the changes.

catalystneuro / nwb-conversion-tools

Refactor `dict_deep_update` function. #482