gchq / gafferpy

Python API for Gaffer
https://gchq.github.io/gafferpy/
Apache License 2.0
1 stars 2 forks source link

A PySpark API for Gaffer #2

Open m316257 opened 5 years ago

m316257 commented 5 years ago

Gaffer has a Spark library with Scala and Java APIs for accessing data using Spark; generating RDDs and Spark DataFrames from Gaffer graphs.

Gaffer also has a python shell with implementations of standard Gaffer operations that can be executed on the graph using Gaffer's rest service.

Extending the python API to support spark operations - producing RDDs and DataFrames - would open Gaffer up to a lot of useful python and spark data science and machine learning libraries

n3101 commented 3 years ago

@m316257 @GCHQ-83497 Hello, please will you tell me the status of this issue? FYI, we are considering the alternative "fishbowl" shell as our way forward; and would be interested in whether anything you have here is complete enough / compatible to lift & reuse.

GCHQ-83497 commented 3 years ago

@m316257 correct me if I am wrong - been a long old time since I have worked on this - @n3101 idea was to be able interact directly with gaffer across a network, so currently in this can run most if not all queries from python and get the those results back, had jaffer which was a java version of this. This was the same for adding in PySpark, so believe that runs in a sort of remote mode as well (sorry its been nearly 2 years!). Last time I worked on this had added in some features so that you could hook into Authentication and Policy type stuff - cannot for the life of me remember if that works or not. I also think there was the first draft attempt at containerising Gaffer in this as well