BioSchemas / specifications

Issue tracker, technical wiki, and example markup
https://bioschemas.org
51 stars 50 forks source link

properties for LabProtocol and LabProcess #675

Open marco-brandizi opened 3 weeks ago

marco-brandizi commented 3 weeks ago

In the current draft description, LabProtocol has the properties bioSample, sample, computationalTool, labEquipment, reagent. LabProcess doesn't have any of these properties.

As far as I know, this can be a problem when multiple protocol applications (ie, LabProcess instances) are variants of the same protocol/plan (ie, a LabProtocol instance, linked via executesLabProtocol with n-1 cardinality), especially in the case of bioSample. For instance, suppose that essentially the same treatment (protocol) is administered to all the biosamples in two treatment groups, and one group uses a reagent, while the other group uses another (similar examples could be made with two versions of a software product or lab machinery).

Shouldn't these properties be allowed in LabProcess too?

Moreover, in this issue it was proposed to add PropertyValue to the range of such properties. In my view (and from experience with the ISA model) , having them as sub-properties of parameterValue (in the case of LabProcess) and as sub-properties of a new property named like parameter with FormalParameter in the range (in the case of LabProtocol) would simplify searches and use cases, since often things like reagents or software names are searched in the lab parameters.

bioSample would make an exception to this, since to me, it's either an input/output (in the case of LabProtocol) or an object/result (in the case of LabProcess).

In the latter case, schema:Action defines input/output with these terms, but not calling them input/output is very weird to a biologist, so I'd also make input/output sub-properties of object/result and I'd prefer these property names in the context of life science. Or, we could introduce more specific properties, such as labProtocolInput, labProtocolOutput (since schema:input/schema:output are so generic names, and I'm pretty sure sooner or later they will clash with some other desired meaning in the same or another application domain, if it hasn't already happened).

Finally, why does LabProtocol have both the properties bioSample and sample, with the same description and different ranges?

floWetzels commented 3 days ago

Hi Marco, thanks for your input! Here are a few comments on what you wrote. Looking forward to discuss this in our meeting.

In the current draft description, LabProtocol has the properties bioSample, sample, computationalTool, labEquipment, reagent. LabProcess doesn't have any of these properties.

As far as I know, this can be a problem when multiple protocol applications (ie, LabProcess instances) are variants of the same protocol/plan (ie, a LabProtocol instance, linked via executesLabProtocol with n-1 cardinality), especially in the case of bioSample. For instance, suppose that essentially the same treatment (protocol) is administered to all the biosamples in two treatment groups, and one group uses a reagent, while the other group uses another (similar examples could be made with two versions of a software product or lab machinery).

Shouldn't these properties be allowed in LabProcess too?

The current LabProcess type was inspired by the ISA model. There, a protocol has constant components and variable parameters. We intended to use reagents, labEquipment and computationalTool as components (they are part of the LabProtocol type) while the process defines the variable parameters. Multiple processes can implement the same protocol, but only differ in parameters and in the inputs/outputs. Therefore, if components change, it is a different protocol (e.g. in your example, either you're describing two distinct protocols, or the reagents and the software should be described as parameters). Of course, such a restriction is up for discussion, we're only describing how the LabProcess type was designed.

Moreover, in this issue it was proposed to add PropertyValue to the range of such properties. In my view (and from experience with the ISA model) , having them as sub-properties of parameterValue (in the case of LabProcess) and as sub-properties of a new property named like parameter with FormalParameter in the range (in the case of LabProtocol) would simplify searches and use cases, since often things like reagents or software names are searched in the lab parameters.

This indeed fits to what we explained above: annotating reagents and software as parameters should be perfectly fine. We believe the important part is to make a clear distinction between constant and variable properties in the vocabulary, but of course reagents and softwares can be both. As a short remark, we don't think that FormalParameter fits into the range of components and parameters, as it has a completely different semantic interpretation strictly tied to computational workflows.

In the latter case, schema:Action defines input/output with these terms, but not calling them input/output is very weird to a biologist, so I'd also make input/output sub-properties of object/result and I'd prefer these property names in the context of life science. Or, we could introduce more specific properties, such as labProtocolInput, labProtocolOutput (since schema:input/schema:output are so generic names, and I'm pretty sure sooner or later they will clash with some other desired meaning in the same or another application domain, if it hasn't already happened).

As you stated, Labprocess is meant as a type describing the mapping from input and output. For those two terms, we have a perfect semantic mapping to existing ones, namely object and result. Of course, the terms input and output are more commonplace in the life-science community, we use them too. However, we prioritize using those from the existing vocabulary as they can also be understand just from generic knowledge in schema.org.