Open billingross opened 1 month ago
Another question is how much information I want to include in the match patterns for different triggers. The less information I match for the more modular each trigger can be but then those queries will require more indexes.
For instance, here are (2) queries for matching Fastq files:
relateFastqToReadGroup: (?P<sample>SAMN\d+)/HAS_READ_GROUP/(?P<read_group>ERR\d+)/HAS_FASTQ/(?P<fastq_name>ERR\d+_[1-2])\.fastq.gz$
relateFastqToReadGroup: (?P<read_group>ERR\d+)/HAS_FASTQ/(?P<fastq_name>ERR\d+_[1-2])\.fastq.gz$
More information in the trigger means less information needed in the database indexes because I can use more graph traversal:
MATCH (s:Sample)-[:HAS_READ_GROUP]->(r:ReadGroup), MERGE (f:Fastq) WITH r, f MERGE (r)-[:HAS_FASTQ]->(f)
But for these patterns to work, they need to be serialized. The (:Sample)-[:HAS_READ_GROUP]->(:ReadGroup)
relationship needs to already exist for that query to work.
So then it's probably best not to overcomplicate it and just use more indexes.
If I can gather metadata on the Sample
and ReadGroup
from the object name, should I use the object to trigger creating those nodes and relationships or should I just create the Fastq
node with sample
and readGroup
as properties and then use database triggers to generate the Sample
and ReadGroup
nodes?
I shouldn't just create the Fastq
node because then I have to backtrack to generate the other nodes. For instance, if I just create the Fastq
node then I would need to create a database trigger to relate ReadGroup
to Fastq
which is triggered by generation of a Fastq
node which; OK, that's fine I guess. But then creation of the Sample
node needs to be triggered by creation of the (ReadGroup)-[:HAS_FASTQ]->(Fastq)
relationship which is less intuitive. Why would creation of a Sample
node be triggered by a HAS_FASTQ
relationship?
Design principle: The name
property of each node should uniquely distinguish it from every other node with the same label.
And then every label can be indexed by the value name
.
Now I have (2) types of triggers: object triggers and database triggers and my question is which operations should be controlled by object triggers and which by database triggers. I had a thought that database triggers should be reserved for launching jobs and everything else should be controlled by object triggers.
Object triggers activate queries used for:
Database triggers activate queries used for:
Job launcher function requests queries used for:
I guess there don't need to be hard rules. Triggers activate database queries and those triggers can be in different places.