Open marco-brandizi opened 5 months ago
Hi Marco, thanks for your input! Here are a few comments on what you wrote. Looking forward to discuss this in our meeting.
In the current draft description, LabProtocol has the properties bioSample, sample, computationalTool, labEquipment, reagent. LabProcess doesn't have any of these properties.
As far as I know, this can be a problem when multiple protocol applications (ie, LabProcess instances) are variants of the same protocol/plan (ie, a LabProtocol instance, linked via executesLabProtocol with n-1 cardinality), especially in the case of bioSample. For instance, suppose that essentially the same treatment (protocol) is administered to all the biosamples in two treatment groups, and one group uses a reagent, while the other group uses another (similar examples could be made with two versions of a software product or lab machinery).
Shouldn't these properties be allowed in LabProcess too?
The current LabProcess
type was inspired by the ISA model. There, a protocol has constant components and variable parameters. We intended to use reagents
, labEquipment
and computationalTool
as components (they are part of the LabProtocol
type) while the process defines the variable parameters. Multiple processes can implement the same protocol, but only differ in parameters and in the inputs/outputs. Therefore, if components change, it is a different protocol (e.g. in your example, either you're describing two distinct protocols, or the reagents and the software should be described as parameters). Of course, such a restriction is up for discussion, we're only describing how the LabProcess
type was designed.
Moreover, in this issue it was proposed to add PropertyValue to the range of such properties. In my view (and from experience with the ISA model) , having them as sub-properties of parameterValue (in the case of LabProcess) and as sub-properties of a new property named like parameter with FormalParameter in the range (in the case of LabProtocol) would simplify searches and use cases, since often things like reagents or software names are searched in the lab parameters.
This indeed fits to what we explained above: annotating reagents and software as parameters should be perfectly fine. We believe the important part is to make a clear distinction between constant and variable properties in the vocabulary, but of course reagents and softwares can be both. As a short remark, we don't think that FormalParameter fits into the range of components and parameters, as it has a completely different semantic interpretation strictly tied to computational workflows.
In the latter case, schema:Action defines input/output with these terms, but not calling them input/output is very weird to a biologist, so I'd also make input/output sub-properties of object/result and I'd prefer these property names in the context of life science. Or, we could introduce more specific properties, such as labProtocolInput, labProtocolOutput (since schema:input/schema:output are so generic names, and I'm pretty sure sooner or later they will clash with some other desired meaning in the same or another application domain, if it hasn't already happened).
As you stated, Labprocess
is meant as a type describing the mapping from input and output. For those two terms, we have a perfect semantic mapping to existing ones, namely object
and result
. Of course, the terms input and output are more commonplace in the life-science community, we use them too. However, we prioritize using those from the existing vocabulary as they can also be understand just from generic knowledge in schema.org
.
Hi @floWetzels, thank you for your comments and sorry for my late reply.
This indeed fits to what we explained above: annotating reagents and software as parameters should be perfectly fine. We believe the important part is to make a clear distinction between constant and variable properties in the vocabulary, but of course reagents and softwares can be both
I understand this as follow: I could define an RDF resource (URI) that is an instance of BioChemEntity
and is pointed both by bioschema:reagent
and by bioschema:parameterValue
. That would be good and would ensure the distinction between constant and variable components that you mentioned. However, the problem is that right know, parameterValue
has PropertyValue
only in its range.
A reagent could be made an instance of PV too, but too ugly. An alternative is to model a reagent as a reagent when it's a constant and as a PV (with name and value) when it varies. This is problematic because it introduces two different ways to model the same thing.
So, another option is what I was initially saying: properties like reagent could be used for LabProcess too, they could even be sub-properties of parameter
/parameterValue
. When they don't vary over the application of a LabProtocol
, the corresponding LabProcess
instances wouldn't have further values attached (ie, a LabProtocol
may have defaults/constant parameters).
As a short remark, we don't think that FormalParameter fits into the range of components and parameters, as it has a completely different semantic interpretation strictly tied to computational workflows.
I see a couple of issues with this (similar ones occur here and there in both schema.org and Bioschemas):
FormalParameter
restrict it to computational workflows, rather both are more generic than that (A FormalParameter is an identified variable used to stand for the actual value(s) that are consumed/produced by a set of steps).LabProtocol
can have the input
and output
properties and both of them have FormalParameter
as range. So, all of input
, output
and FormalParameter
seem to be used for more than computational workflows (or inconsistently?).For those two terms, we have a perfect semantic mapping to existing ones, namely object and result. Of course, the terms input and output are more commonplace in the life-science community, we use them too. However, we prioritize using those from the existing vocabulary as they can also be understand just from generic knowledge in schema.org.
inheriting from the top is the sensible thing to do. In fact, I don't propose alternative properties to object
/result
, but synonyms or subproperties of them. Moreover, as said above, input
/output
are properties being proposed for LabProtocol
(I didn't notice it the first time), but with yet another meaning. As a data engineer, I find it confusing, an average biologists would find it very confusing :-)
In the current draft description,
LabProtocol
has the propertiesbioSample
,sample
,computationalTool
,labEquipment
,reagent
.LabProcess
doesn't have any of these properties.As far as I know, this can be a problem when multiple protocol applications (ie,
LabProcess
instances) are variants of the same protocol/plan (ie, aLabProtocol
instance, linked viaexecutesLabProtocol
with n-1 cardinality), especially in the case ofbioSample
. For instance, suppose that essentially the same treatment (protocol) is administered to all the biosamples in two treatment groups, and one group uses a reagent, while the other group uses another (similar examples could be made with two versions of a software product or lab machinery).Shouldn't these properties be allowed in
LabProcess
too?Moreover, in this issue it was proposed to add
PropertyValue
to the range of such properties. In my view (and from experience with the ISA model) , having them as sub-properties ofparameterValue
(in the case ofLabProcess
) and as sub-properties of a new property named likeparameter
withFormalParameter
in the range (in the case ofLabProtocol
) would simplify searches and use cases, since often things like reagents or software names are searched in the lab parameters.bioSample
would make an exception to this, since to me, it's either aninput
/output
(in the case ofLabProtocol
) or anobject
/result
(in the case ofLabProcess
).In the latter case,
schema:Action
defines input/output with these terms, but not calling them input/output is very weird to a biologist, so I'd also makeinput
/output
sub-properties ofobject
/result
and I'd prefer these property names in the context of life science. Or, we could introduce more specific properties, such aslabProtocolInput
,labProtocolOutput
(sinceschema:input
/schema:output
are so generic names, and I'm pretty sure sooner or later they will clash with some other desired meaning in the same or another application domain, if it hasn't already happened).Finally, why does
LabProtocol
have both the propertiesbioSample
andsample
, with the same description and different ranges?