billingross commented 1 month ago

Now I have (2) types of triggers: object triggers and database triggers and my question is which operations should be controlled by object triggers and which by database triggers. I had a thought that database triggers should be reserved for launching jobs and everything else should be controlled by object triggers.

Object triggers activate queries used for:

Object node creation
Functional relationship creation
Provenance relationship creation (GENERATED)

Database triggers activate queries used for:

JobRequest node creation
Provenance relationship creation (WAS_USED_BY, REQUESTED)

Job launcher function requests queries used for:

Job node creation

I guess there don't need to be hard rules. Triggers activate database queries and those triggers can be in different places.

billingross commented 1 month ago

Another question is how much information I want to include in the match patterns for different triggers. The less information I match for the more modular each trigger can be but then those queries will require more indexes.

For instance, here are (2) queries for matching Fastq files:

relateFastqToReadGroup: (?P<sample>SAMN\d+)/HAS_READ_GROUP/(?P<read_group>ERR\d+)/HAS_FASTQ/(?P<fastq_name>ERR\d+_[1-2])\.fastq.gz$
relateFastqToReadGroup: (?P<read_group>ERR\d+)/HAS_FASTQ/(?P<fastq_name>ERR\d+_[1-2])\.fastq.gz$

More information in the trigger means less information needed in the database indexes because I can use more graph traversal:

MATCH (s:Sample)-[:HAS_READ_GROUP]->(r:ReadGroup), MERGE (f:Fastq) WITH r, f MERGE (r)-[:HAS_FASTQ]->(f)

But for these patterns to work, they need to be serialized. The (:Sample)-[:HAS_READ_GROUP]->(:ReadGroup) relationship needs to already exist for that query to work.

So then it's probably best not to overcomplicate it and just use more indexes.

billingross commented 1 month ago

If I can gather metadata on the Sample and ReadGroup from the object name, should I use the object to trigger creating those nodes and relationships or should I just create the Fastq node with sample and readGroup as properties and then use database triggers to generate the Sample and ReadGroup nodes?

billingross commented 1 month ago

I shouldn't just create the Fastq node because then I have to backtrack to generate the other nodes. For instance, if I just create the Fastq node then I would need to create a database trigger to relate ReadGroup to Fastq which is triggered by generation of a Fastq node which; OK, that's fine I guess. But then creation of the Sample node needs to be triggered by creation of the (ReadGroup)-[:HAS_FASTQ]->(Fastq) relationship which is less intuitive. Why would creation of a Sample node be triggered by a HAS_FASTQ relationship?

billingross commented 1 month ago

Design principle: The name property of each node should uniquely distinguish it from every other node with the same label.

And then every label can be indexed by the value name.

billingross / trellis-v2

Describe how to organize object vs database triggers #46

Object triggers activate queries used for:

Database triggers activate queries used for:

Job launcher function requests queries used for: