Implement a database service for DAGs

gouravshenoy commented 7 years ago

Find out which would be an ideal database for storing/fetching DAGs
This issue can be used for discussions.

Checklist for this task:

[ ] Finalize a database ideal for this purpose.
[ ] Implement a database service for storing/retrieving/modifying DAGs for tasks.
[ ] Expose CPIs for accessing this service
[ ] Write unit-tests for this service

amrutakamat16 commented 7 years ago

I feel a graph database would be ideal for storing DAGs. One popular option is NoSQL graph database called Neo4j. It is open source for all noncommercial uses.

Unlike other databases, relationships take first priority in graph databases. This means your application doesn’t have to infer data connections using things like foreign keys. Not only do graph databases effectively store data relationships; they’re also flexible when expanding a data model or conforming to changing business needs. Neo4j has CQL(Cypher Query Language) as query language.

We create DAGs using this simple query: CREATE ({task:'A'})-[<:relationship-details>]->({task:'D'})

WE can write Match queries to retrive path between 2 tasks. MATCH p=(node:n1 {task:'A'})-[:match-relationships]->(node:n4 {task:'D'}) RETURN p

We can also easily edit DAGs to add nodes between 2 nodes.

marpierc commented 7 years ago

Sounds good in principal, Amruta, but please come up with some ways to evaluate and compare with other approaches.

From: ajinkya-dhamnaskar notifications@github.com Reply-To: airavata-courses/spring17-workload-management reply@reply.github.com Date: Tuesday, February 21, 2017 at 2:49 PM To: airavata-courses/spring17-workload-management spring17-workload-management@noreply.github.com Cc: Subscribed subscribed@noreply.github.com Subject: Re: [airavata-courses/spring17-workload-management] Implement a database service for DAGs (#3)

I feel a graph database would be ideal for storing DAGs. One popular option is NoSQL graph database called Neo4j. It is open source for all noncommercial uses.

Unlike other databases, relationships take first priority in graph databases. This means your application doesn’t have to infer data connections using things like foreign keys. Not only do graph databases effectively store data relationships; they’re also flexible when expanding a data model or conforming to changing business needs. Neo4j has CQL(Cypher Query Language) as query language.

We create DAGs using this simple query: CREATE ({task:'A'})-[<:relationship-details>]->({task:'D'})

WE can write Match queries to retrive path between 2 tasks. MATCH p=(node:n1 {task:'A'})-[:match-relationships]->(node:n4 {task:'D'}) RETURN p

We can also easily edit DAGs to add nodes between 2 nodes.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

amrutakamat16 commented 7 years ago

The reason why I opted for Neo4j is because for our theme we basically need a database to store DAGs (perfectly suits our need). If we go with a relational database, extracting relation between nodes would mean a lot of table joins. These operations are compute and memory-intensive and have an exponential cost. On the other hand, graph databases involve first class support for “relationships".

Each node in the graph database model directly and physically contains a list of relationship-records that represent its relationships to other nodes. These relationship records are organized by type and direction and may hold additional attributes. Whenever you run the equivalent of a JOIN operation in graph database, the database just uses this list and has direct access to the connected nodes, eliminating the need for a expensive search / match computation. We would be having a lot of DAGs in our case and thus would involve a lot of join operations which might make our queries too long. Cypher on the other hand uses vastly less code than SQL. Adding/editing DAGs would be easier with cypher.

We also have an option of deconstructing JSON and inserting it into Neo4j by just using plain Cypher.

ajinkya-dhamnaskar commented 7 years ago

Amruta, it is good to have working playground to explore graph database and how well we can exploit the same for our use. As discussed, could you please add your implementation to the repo.

Gourav and I were discussing what all information a node can accommodate, we need to come to a conclusion regarding usability of the graph database.

amrutakamat16 commented 7 years ago

I have added a dummy application under GraphDB folder which can create nodes and fetch paths from a graph DB. Currently working on converting the same to a maven project.

gouravshenoy commented 7 years ago

@amrutakamat16 this is good progress. we need to add sufficient APIs to be able to save/retrieve/update a DAG into this database as efficiently as we visualize it.

airavata-courses / spring17-workload-management

Implement a database service for DAGs #3