cern-sis / issues-inspire

0 stars 0 forks source link

merger is buggy when list is present in `root` and (`head` xor `update`) #440

Open michamos opened 8 months ago

michamos commented 8 months ago

The merger seems to behave incorrectly when a list field such as authors is present in root and only one of head or update but not both.

I've looked in detail at the case where authors is present in root and update but not head (and discussed it with @PascalEgn IRL). What happens in that case is that the field doesn't get treated as a list but as a dict with integer keys (à la javascript) due to https://github.com/inveniosoftware-contrib/json-merger/blob/97f1d195887dc473c26a5fae3f7477759bc34e7f/json_merger/merger.py#L231-L232 (which gets triggered for lists inside lists) and https://github.com/inveniosoftware-contrib/json-merger/blob/97f1d195887dc473c26a5fae3f7477759bc34e7f/json_merger/dict_merger.py#L106 (for lists inside dicts). Note that considering that a missing field is equivalent to an empty list would be incorrect, as the unifier strategies behave differently for empty lists and for missing keys: KEEP_ONLY_HEAD should result in taking the update in case the field is not present on head but is there in the update, but would result in an empty list if we treated a missing field on head as an empty list.

The opposite case where authors is present in root and head but not update seems buggy too: https://inspirehep.net/holdingpen/6705454, but I have not investigated further.

drjova commented 8 months ago

@michamos we need a bit more details

michamos commented 8 months ago

@drjova here you go, happy to discuss it further.

drjova commented 5 months ago

@michamos could you please give us example data to replicate it in the tests?