avian2 / jsonmerge

Merge a series of JSON documents.
MIT License
214 stars 25 forks source link

jsonmerge performance #57

Open dnj12345 opened 2 years ago

dnj12345 commented 2 years ago

Hi, I have been using JSON-merge in my projects for while now. It has been working great. I've recently started using it in a new project where I update/merge multiple JSON documents every few seconds. Since these changes, the CPU utilization for my app has gone up considerably. I've profiled my code and narrowed it down. JsonMerge call seems to be causing most of the spike. I have tried different merging strategies etc, but I have not made any progress in reducing the CPU consumption. Any suggestions on how I could go about reducing the CPU consumption of JSON-merge? Any pointers is greatly appreciated. Thanks.

PS: my default merging strategy.

merge_schema = """
{
  "oneOf": [
    { "type": "string" },
    { "type": "number" },
    { "type": "boolean" },
    {
      "type": "array",
      "mergeStrategy": "arrayMergeById",
      "mergeOptions": {"idRef": "/"}
    },
    {
      "type": "object",
      "additionalProperties": { "$ref": "#" }
    }
  ]
}
"""

def JsonMerge(base, new_obj):
  schema = json.loads(merge_schema)
  merger = Merger(schema)
  return merger.merge(base, new_obj, schema)
avian2 commented 2 years ago

Hi. I don't have any concrete pointers. I suggest you use your profiler and check what code in jsonmerge is the bottleneck in your particular use case. If you manage to increase jsonmerge performance, consider making a pull request. I would be interested in merging it, unless it significantly increases the complexity of the code.

dnj12345 commented 2 years ago

Here's the profile output. Hope this gives some info. Will continue to look on my end as well.

         30100 function calls (28763 primitive calls) in 0.016 seconds

   Ordered by: cumulative time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.016    0.016 /home/ubuntu/git/myapplication/bin/json_merger.py:49(JsonMerge)
        1    0.000    0.000    0.016    0.016 /usr/local/lib/python3.5/dist-packages/jsonmerge/__init__.py:298(merge)
     57/1    0.000    0.000    0.016    0.016 /usr/local/lib/python3.5/dist-packages/jsonmerge/__init__.py:43(descend)
     89/2    0.000    0.000    0.016    0.008 /usr/local/lib/python3.5/dist-packages/jsonmerge/__init__.py:108(call_descender)
     30/1    0.000    0.000    0.016    0.016 /usr/local/lib/python3.5/dist-packages/jsonmerge/descenders.py:68(descend_instance)
     28/1    0.000    0.000    0.012    0.012 /usr/local/lib/python3.5/dist-packages/jsonmerge/__init__.py:111(work)
      8/1    0.000    0.000    0.012    0.012 /usr/local/lib/python3.5/dist-packages/jsonmerge/strategies.py:265(merge)
     44/3    0.000    0.000    0.012    0.004 /usr/local/lib/python3.5/dist-packages/jsonmerge/descenders.py:21(descend_instance)
      150    0.000    0.000    0.010    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/descenders.py:76(is_valid)
  930/270    0.002    0.000    0.010    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/validators.py:296(iter_errors)
    40/30    0.000    0.000    0.007    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_validators.py:41(additionalProperties)
   384/28    0.000    0.000    0.007    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/validators.py:343(descend)
    60/28    0.000    0.000    0.007    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_validators.py:252(ref)
    60/28    0.000    0.000    0.006    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_validators.py:333(oneOf)
      810    0.001    0.000    0.005    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_validators.py:269(type)
      340    0.000    0.000    0.005    0.000 {built-in method builtins.next}
      161    0.000    0.000    0.003    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/validators.py:761(resolve)
       60    0.000    0.000    0.003    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_validators.py:348(<listcomp>)
      202    0.000    0.000    0.003    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/validators.py:740(resolving)
      101    0.000    0.000    0.003    0.000 /usr/lib/python3.5/contextlib.py:57(__enter__)
      138    0.000    0.000    0.002    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/validators.py:361(is_valid)
      161    0.000    0.000    0.002    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/validators.py:768(resolve_from_url)
      454    0.000    0.000    0.002    0.000 {built-in method builtins.any}
      557    0.000    0.000    0.002    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/validators.py:355(is_type)
      900    0.000    0.000    0.002    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_validators.py:272(<genexpr>)
      557    0.000    0.000    0.002    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_types.py:66(is_type)
      360    0.001    0.000    0.001    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_utils.py:115(types_msg)
        3    0.000    0.000    0.001    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/strategies.py:175(merge)
      173    0.000    0.000    0.001    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_utils.py:21(__getitem__)
      557    0.000    0.000    0.001    0.000 /usr/local/lib/python3.5/dist-packages/pyrsistent/_pmap.py:70(__getitem__)
      360    0.001    0.000    0.001    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/exceptions.py:121(_set)
      176    0.000    0.000    0.001    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_utils.py:14(normalize)
      266    0.000    0.000    0.001    0.000 /usr/lib/python3.5/urllib/parse.py:339(urlsplit)
      172    0.000    0.000    0.001    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/validators.py:783(resolve_fragment)
     2377    0.000    0.000    0.001    0.000 {built-in method builtins.isinstance}
      557    0.000    0.000    0.001    0.000 /usr/local/lib/python3.5/dist-packages/pyrsistent/_pmap.py:60(_getitem)
      360    0.001    0.000    0.001    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/exceptions.py:22(__init__)
      161    0.000    0.000    0.001    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/compat.py:45(urldefrag)
      159    0.000    0.000    0.001    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/jsonvalue.py:45(get)
      209    0.000    0.000    0.001    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/jsonvalue.py:22(_subval)
      557    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/pyrsistent/_pmap.py:54(_get_bucket)
      259    0.000    0.000    0.000    0.000 /usr/lib/python3.5/urllib/parse.py:408(urlunsplit)
      553    0.000    0.000    0.000    0.000 /usr/lib/python3.5/urllib/parse.py:100(_coerce_args)
      270    0.000    0.000    0.000    0.000 /usr/lib/python3.5/abc.py:178(__instancecheck__)
        3    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/resolver.py:14(__init__)
        3    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/validators.py:636(__init__)
      176    0.000    0.000    0.000    0.000 /usr/lib/python3.5/urllib/parse.py:247(geturl)
     2671    0.000    0.000    0.000    0.000 {method 'get' of 'dict' objects}
       90    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_types.py:29(is_number)
        1    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/__init__.py:228(__init__)
        1    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/validators.py:666(from_schema)
        3    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_utils.py:17(__init__)
       52    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/__init__.py:34(is_type)
       30    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/descenders.py:56(do_descend)
      571    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/validators.py:507(<lambda>)
      101    0.000    0.000    0.000    0.000 /usr/lib/python3.5/contextlib.py:63(__exit__)
      161    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/validators.py:684(push_scope)
      101    0.000    0.000    0.000    0.000 /usr/lib/python3.5/contextlib.py:131(helper)
     1622    0.000    0.000    0.000    0.000 {built-in method builtins.getattr}
      450    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_utils.py:156(ensure_list)
       54    0.000    0.000    0.000    0.000 /usr/lib/python3.5/urllib/parse.py:427(urljoin)
     1455    0.000    0.000    0.000    0.000 {built-in method builtins.setattr}
       10    0.000    0.000    0.000    0.000 {method 'update' of 'dict' objects}
       15    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/validators.py:657(<genexpr>)
      130    0.000    0.000    0.000    0.000 /usr/lib/python3.5/logging/__init__.py:1257(debug)
       15    0.000    0.000    0.000    0.000 /usr/lib/python3.5/_collections_abc.py:675(__iter__)
      480    0.000    0.000    0.000    0.000 /usr/lib/python3.5/_weakrefset.py:70(__contains__)
      264    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/jsonvalue.py:10(__init__)
      209    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/jsonvalue.py:19(_ref_escape)
        1    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/__init__.py:96(__init__)
       17    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/__init__.py:101(default_strategy)
       15    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/descenders.py:123(descend_instance)
       15    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/descenders.py:110(descend)
      101    0.000    0.000    0.000    0.000 /usr/lib/python3.5/contextlib.py:37(__init__)
      130    0.000    0.000    0.000    0.000 /usr/lib/python3.5/logging/__init__.py:1515(isEnabledFor)
       32    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/jsonvalue.py:58(items)
      784    0.000    0.000    0.000    0.000 {method 'replace' of 'str' objects}
       14    0.000    0.000    0.000    0.000 /usr/lib/python3.5/urllib/parse.py:288(urlparse)
      161    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/validators.py:695(pop_scope)
      191    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_types.py:36(is_object)
       23    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/strategies.py:166(iter_index_key_item)
      557    0.000    0.000    0.000    0.000 {built-in method builtins.hash}
      579    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/jsonvalue.py:16(is_undef)
       15    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/jsonvalue.py:36(__getitem__)
      646    0.000    0.000    0.000    0.000 {built-in method builtins.len}
      378    0.000    0.000    0.000    0.000 {method 'join' of 'str' objects}
      539    0.000    0.000    0.000    0.000 {method 'append' of 'list' objects}
      322    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/validators.py:714(resolution_scope)
      553    0.000    0.000    0.000    0.000 /usr/lib/python3.5/urllib/parse.py:89(_noop)
      360    0.000    0.000    0.000    0.000 {built-in method builtins.repr}
       78    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_utils.py:84(find_additional_properties)
      109    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
      172    0.000    0.000    0.000    0.000 {method 'lstrip' of 'str' objects}
       90    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_types.py:40(is_string)
       23    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/jsonvalue.py:62(__iter__)
       90    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_types.py:14(is_bool)
      130    0.000    0.000    0.000    0.000 /usr/lib/python3.5/logging/__init__.py:1501(getEffectiveLevel)
      462    0.000    0.000    0.000    0.000 {method 'appendleft' of 'collections.deque' objects}
      130    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/__init__.py:31(_indent)
       96    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_types.py:10(is_array)
       45    0.000    0.000    0.000    0.000 /usr/lib/python3.5/urllib/parse.py:321(_checknetloc)
        3    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_utils.py:24(__setitem__)
        7    0.000    0.000    0.000    0.000 /usr/lib/python3.5/urllib/parse.py:397(urlunparse)
        3    0.000    0.000    0.000    0.000 /usr/lib/python3.5/functools.py:422(decorating_function)
        3    0.000    0.000    0.000    0.000 /usr/lib/python3.5/functools.py:43(update_wrapper)
        3    0.000    0.000    0.000    0.000 /usr/lib/python3.5/_collections_abc.py:756(update)
       70    0.000    0.000    0.000    0.000 /usr/lib/python3.5/urllib/parse.py:547(unquote)
      161    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}
        1    0.000    0.000    0.000    0.000 /usr/lib/python3.5/json/__init__.py:271(loads)
       24    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/jsonvalue.py:25(__setitem__)
       57    0.000    0.000    0.000    0.000 {method 'find' of 'str' objects}
       64    0.000    0.000    0.000    0.000 /usr/lib/python3.5/urllib/parse.py:322(<genexpr>)
        1    0.000    0.000    0.000    0.000 /usr/lib/python3.5/json/decoder.py:334(decode)
       59    0.000    0.000    0.000    0.000 {built-in method __new__ of type object at 0xa3cde0}
       11    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/strategies.py:163(get_key)
      102    0.000    0.000    0.000    0.000 {method 'extend' of 'list' objects}
       14    0.000    0.000    0.000    0.000 <string>:12(__new__)
        4    0.000    0.000    0.000    0.000 /usr/lib/python3.5/urllib/parse.py:313(_splitnetloc)
        2    0.000    0.000    0.000    0.000 /usr/lib/python3.5/urllib/parse.py:74(clear_cache)
        1    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/__init__.py:23(__init__)
        1    0.000    0.000    0.000    0.000 /usr/lib/python3.5/json/decoder.py:345(raw_decode)
        1    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/__init__.py:29(<listcomp>)
        4    0.000    0.000    0.000    0.000 {method 'clear' of 'dict' objects}
        3    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/jsonvalue.py:39(append)
        3    0.000    0.000    0.000    0.000 /usr/lib/python3.5/_collections_abc.py:611(items)
        3    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/_utils.py:30(__iter__)
       60    0.000    0.000    0.000    0.000 {built-in method builtins.ord}
        3    0.000    0.000    0.000    0.000 /usr/lib/python3.5/functools.py:391(lru_cache)
        2    0.000    0.000    0.000    0.000 {method 'match' of '_sre.SRE_Pattern' objects}
       17    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/strategies.py:53(merge)
        1    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonschema/validators.py:262(__init__)
        4    0.000    0.000    0.000    0.000 {built-in method builtins.hasattr}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        4    0.000    0.000    0.000    0.000 {built-in method builtins.min}
        3    0.000    0.000    0.000    0.000 /usr/lib/python3.5/_collections_abc.py:631(__init__)
        5    0.000    0.000    0.000    0.000 {method 'items' of 'dict' objects}
        1    0.000    0.000    0.000    0.000 /usr/local/lib/python3.5/dist-packages/jsonmerge/descenders.py:18(__init__)
        3    0.000    0.000    0.000    0.000 {built-in method builtins.iter}
        4    0.000    0.000    0.000    0.000 {method 'lower' of 'str' objects}
        1    0.000    0.000    0.000    0.000 {method 'startswith' of 'str' objects}
        6    0.000    0.000    0.000    0.000 {method 'items' of 'collections.OrderedDict' objects}
        2    0.000    0.000    0.000    0.000 {method 'end' of '_sre.SRE_Match' objects}
dnj12345 commented 1 year ago

Hi @avian2, Is there any global data that jsonMerge maintains? I observed that jsonMerge keeps taking longer and longer to merge similarly sized json's which leads to CPU spike as well. Hope that gives a clue.

avian2 commented 1 year ago

There should be no global data kept by jsonmerge that grows with each processed json. It's possible there is a reference leak somewhere, but that would be a bug. It would be helpful to get a small test case where this occurs.