Closed mathew-thomson closed 2 years ago
The approach is robust, but technical. Good documentation and examples will help.
I am wary of having an attribute (parent) mean two different things based on the value of another attribute (pooled).
To me, that jeopardizes one of the model's important features, which is the ability to be mapped onto an ontology: because then we are no longer only mapping between terms that have precise definitions, but are also trying to parse business logic (instead of simple x=>y associations, we end up with branching paths if(x, y=>u, y=>v). That could quickly make the model difficult to parse.
Here is an example setup:
flowchart TD
A --> V;
B --> V;
V --> X;
V --> Y;
V --> Z;
E[Samples] --- F;
F[Pooled] --- G[Subsamples];
What you are suggesting, I think is | sampleID | parentID | pooled |
---|---|---|---|
A | V | T | |
B | V | T | |
V | null | T | |
X | V | F | |
Y | V | F | |
Z | V | F |
Which merges the sample -> pooled and the pooled -> subsample links. I would instead suggest splitting up concerns like this using a new attribute ("pooledID"): | sampleID | parentID | pooledID | pooled |
---|---|---|---|---|
A | null | V | false | |
B | null | V | false | |
V | null | null | true | |
X | V | null | false | |
Y | V | null | false | |
Z | V | null | false |
I thought that the actual representation of your diagram would be: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
sampleID | parentID | pooled -- | -- | -- A | | F B | | F V | A | T V | B | T X | V | F Y | V | F Z | V | F
Upon reviewing our current structure, it seems that we lost sight of how we were going to account for pooled samples in the ODM.
Rather than trying to blow up the
samples
table to figure out a way of capturing this information, it was proposed that we could use the booleanpooled
variable as a switch for the parent-child relationships.So for a parent sample,
pooled
may or may not = 1 (TRUE). but the child samples will havepooled
= 0 (FALSE). This shows the direction the parent-child relationship.For a pooled sample, the multiple "parents" being pooled together will have
pooled
= 1 (TRUE), and the single sample created from pooling them will also havepooled
= 1 (TRUE). However, the single "child" here will still be recorded as theparSampleID
to the "parents" (recorded in the ERD as child samples).This, while somewhat confusing, has the
pooled
field acting as a sort of "switch" on the directionality of parent-child relationships in thesamples
table.A pooled sample can still have actual child samples as well, but these child sample would have pooled = 0 (FALSE).
This was discussed in a meeting with @jeandavidt @sorinsion @il43 and @DougManuel , but we're happy to hear any feedback or questions on this proposed structure as well.