catalystneuro / nwb-conversion-tools

Create NWB files by converting and combining neural data in proprietary formats and adding essential metadata.
https://nwb-conversion-tools.readthedocs.io/en/main/
BSD 3-Clause "New" or "Revised" License
25 stars 12 forks source link

Refactor `dict_deep_update` function. #482

Open h-mayorquin opened 2 years ago

h-mayorquin commented 2 years ago

While working on PR #481 me and @CodyCBakerPhD were discussing about the poor variable name in this very important function. Right now, we have:

https://github.com/catalystneuro/nwb-conversion-tools/blob/5e66722723420856de70e80b8e465f3bb20ebfbd/src/nwb_conversion_tools/utils/dict.py#L121-L124

Which go against the design principle that variable names should be as meaningful as possible so they work as documentation themselves.

Right now PR #481 has modified the internal names of the function but we are leaving the keywords arguments untouched as that might break some script that rely on this. I am writing this issue as a record of intent for modifying this in the future, propose some break down of the tasks to do, write some other concerns that I have and general discussion.

Proposal for to-do:

I am also in general concerned about all the functionality that we are adding to the function. We have many parameters that change the behavior of the function: 1) list_dict_deep_update (A boolean) for example, controls whether lists are appended/extended or just overwritten. 2) compare_key (a string that equals "name") controls where properties in our metadata have to match so they are updated.

I checked in our library and as far as I could find we don't really use the function without the defaults. My impression is that we are keeping this for backwards compatibility. Given that the inclusion of this parameters account for a large chunk of the complexity of this function and makes a central part of our code base more difficult to understand and modify I think we should also start thinking on dropping them (with the usual warning period of course).

Am I missing something?

CodyCBakerPhD commented 2 years ago

Task list sounds fine. Deprecation to removal should probably be on a 6-month schedule from time of merge.

Moreover, the append_replace_dict_in_list seems to be actually doing more than one thing, so it should be break down for the different actions that it actually performs.

I'd do that in a separate PR

Modify the unit_tests so they reflect some use cases with metadata or source_schemas so they work better as functional documentation.

Never a bad thing to have more explicit tests.

I checked in our library and as far as I could find we don't really use the function without the defaults. My impression is that we are keeping this for backwards compatibility. Given that the inclusion of this parameters account for a large chunk of the complexity of this function and makes a central part of our code base more difficult to understand and modify I think we should also start thinking on dropping them (with the usual warning period of course).

Am I missing something?

I know it's a complicated function, but it's also a very important one that has undergone a lot of work over the past years to allow it to handle all that extra functionality. I don't think we should reduce the amount of things it's capable of, but we can certainly adjust any internal code to be more understandable.

The current state of the function does all the things that have been needed in the past, which is not specific to our local usage within NWBCT. In fact, most of the changes have been needed by several past conversion projects in order to properly update their specific, highly unique structure. Unit tests in https://github.com/catalystneuro/nwb-conversion-tools/blob/main/tests/test_internals/test_json_schema_utils.py were usually updated with minimal examples showcasing the changes.