gchq / Kai

Kai is an experimental Graph-as-a-Service framework built with the Amazon CDK
Apache License 2.0
6 stars 6 forks source link

Allow users to run bulk ingest to load large volumes of data into a graph #51

Open d47853 opened 4 years ago

d47853 commented 4 years ago

The ingest should be carried out by lambdas which can run spark-submit jobs to the Kubernetes cluster. These lambdas should initially be developed outside of Kai and referenced via their ARN. The admins of Kai needs some way of adding ingest lambdas to the deployment. The easiest way I can think to do this is with configuration. You could do it via REST but that would require a new user pool etc.

The ingest objects should be stored in DynamoDB and should have the rough structure:

{
    "name": "My Ingest Job",
    "arn": "lambda arn",
    "arguments": {
        "inputFile": "text",
        "generatorJson": "json"
    }
} 

A Kai user should be able to retrieve these objects (minus the arn) and a UI should be able to use the arguments and their types to render a form that the user can fill in to trigger a bulk ingest.

d47853 commented 4 years ago

Happy for someone else to work on this. If no one want's it, I'll pick it up again later