My ideal situation is to use native database triggers so that I can avoid doing the additional lookup step after a node or relationships has been created or updated. And all triggers would be based on nodes or relationships (as they currently are).
Questions
Do I want to use node labels? My thinking here is that labels are imperfect because they are based on context. Labels are based on ontologies and ontologies are constantly changing as our understanding of the world changes. A label that indicates one thing now may indicate another thing 10-years from now in which case the label will be misleading. So either I would have to constantly be updating labels OR I could just get rid of them entirely and judge nodes purely based on their properties and then just try to devise a consistent method for measuring properties.
How are nodes connected to each other? Previously I was using a single set of relationships to trace provenance: [GENERATED, WAS_USED_BY]. By tracing these (2) relationships you could trace the entire lineage of a data object. I think that's good. There are also lots of other ways that nodes are related. Paired-end fastq objects are related by the HAS_MATE_PAIR relationship. How would I assess this relationship without labels? I could use other properties of the data object such as extension
Example workflows
Example workflow 1
Node properties are assessed
Node is added to the database
Node is connected to the thing that generated it
Example workflow 2
Fastq is added to cloud storage
Fastq belongs to a read group, but the the "read group" is an abstract idea
Create a read group node and relationship to fastq
When a read group node is created it activates a trigger which relates it to a sample (create node, create relationships
Right now I'm kind of hacking together steps because I don't have the native database triggers I would ideally want, but I think that's fine.
Proposals
What if node labels were assigned based on relationships? For instance, the PairedEndFastq label could be assigned to a node if I thought that node would match the criteria for the HAS_MATE_PAIR relationship. So, based on extension (fastq.gz). How to validate that relationship though? I think I would want functions to validate that relationship. Every relationship must be validated by a function?
And then all jobs are based on relationships and can only be launched by relationship triggers.
Overview
My ideal situation is to use native database triggers so that I can avoid doing the additional lookup step after a node or relationships has been created or updated. And all triggers would be based on nodes or relationships (as they currently are).
Questions
GENERATED
,WAS_USED_BY
]. By tracing these (2) relationships you could trace the entire lineage of a data object. I think that's good. There are also lots of other ways that nodes are related. Paired-end fastq objects are related by theHAS_MATE_PAIR
relationship. How would I assess this relationship without labels? I could use other properties of the data object such as extensionExample workflows
Example workflow 1
Example workflow 2
Right now I'm kind of hacking together steps because I don't have the native database triggers I would ideally want, but I think that's fine.
Proposals
PairedEndFastq
label could be assigned to a node if I thought that node would match the criteria for theHAS_MATE_PAIR
relationship. So, based on extension (fastq.gz
). How to validate that relationship though? I think I would want functions to validate that relationship. Every relationship must be validated by a function?