gchq / gafferpy

Python API for Gaffer
https://gchq.github.io/gafferpy/
Apache License 2.0
5 stars 2 forks source link

Add an optional data transformer to gafferpy results #4

Open t92549 opened 2 years ago

t92549 commented 2 years ago

Currently, results are returned in gafferpy as either the direct json result from the Gaffer api, or as gafferpy object equivalent. This is okay for some use cases, but if a users wants to perform a simple, fast query, it can become bogged down in a lot of Java related boilerplate to do with types. This is an example output from the road-traffic example:

{'class': 'uk.gov.gchq.gaffer.data.element.Edge',
  'destination': 'M32:M4 (19)',
  'directed': True,
  'group': 'RoadUse',
  'matchedVertex': 'SOURCE',
  'properties': {'count': {'java.lang.Long': 841303},
                 'countByVehicleType': {'uk.gov.gchq.gaffer.types.FreqMap': {'AMV': 407034,
                                                                             'BUS': 1375,
                                                                             'CAR': 320028,
                                                                             'HGV': 27234,
                                                                             'HGVA3': 1277,
                                                                             'HGVA5': 5964,
                                                                             'HGVA6': 4817,
                                                                             'HGVR2': 11369,
                                                                             'HGVR3': 2004,
                                                                             'HGVR4': 1803,
                                                                             'LGV': 55312,
                                                                             'PC': 1,
                                                                             'WMV2': 3085}},
                 'endDate': {'java.util.Date': 1431543599999},
                 'startDate': {'java.util.Date': 1034319600000}},
  'source': 'M32:1'}

It would be great if this could be optionally return an object that you could get results directly from without nested types involved:

>>> print(result.source)
'M32:1'
>>> print(result.properties.count)
841303
>>> print(result.countByVehicleType.CAR)
320028

This could be implemented as a generator that takes json input to create these results objects lazily. Dictionaries can be mapped to objects easily in Python (see munch).

When creating this generator, users should be able to easily add transform functions to the result, like removing, renaming and applying functions to fields. A lot of this functionality (renaming fields, ignoring fields and transforming them) already comes with Gaffer though, so perhaps this could be added to the OperationChain rather than executed in Python.

t92549 commented 2 years ago

As well as better output handling, it would be great if there was an option for the output to be streamed using execute/chunked.