LabProcess Type Draft - Githubissues

floWetzels commented 10 months ago

Description

This PR adds the new type draft LabProcess. A LabProcess represents the specific application of a LabProtocol to some input (biological material or data) to produce some output (biological material or data). The new type inherits these inputs and outputs from the Schema.org type Action (i.e. the properties object and result). Additionally, it adds properties for referencing the executed protocol (executesProtocol) and parameter values of the process as key-value pairs (parameterValues). This design is heavily inspired by the ISA process model.

Motivation and context

Our overarching goal is to establish a distinct separation between LabProtocol (akin to a Recipe / SOP) and LabProcess (akin to the Action described by such LabProtocol, analogous to a lab notebook in a real-world scenario). The following details elaborate on the necessity for this differentiation and the specific use case we aim to address.

In our perspective, and in harmony with the ISA datamodel, we propose that a LabProtocol aligns more suitably with a HowTo than a CreativeWork. This clarification better reflects the instructional nature of a LabProtocol in guiding experimental procedures. A LabProcess, in contrast, aligns with an Action. Thus, this PR goes hand-in-hand with the recent changes on LabProtocol (https://github.com/BioSchemas/specifications/pull/661).

We use the very generic Schema.org type PropertyValue to describe paramater values of processes in a structured way. This allows users to better annotate a wide range of laboratory processes, as the the PropertyValue type covers any structured key-value pair or key-value-unit triplet (basically allowing any parameter that can be formalized). We hope that this improves findability of research data objects in the following use cases:

Use Case 1 (Findability for comparative analysis)

A process graph encodes structured information for complex experimental setups consisting of multiple experimental steps. It therefore enables search for formal parameters (fixed parameters as well as factors) of specific processes.

Use Case 2 (Findability for fine-grained data acquistion)

A process graph enables semantic web markup and therefore findability of subsets of the data files since relevant metadata is not simply attached to the overall dataset.

Use Case 3 (Findability for Input-based dataset search)

A process graph enables search for samples or datafiles that were input of a specific experimental process, in addition to classic output-driven search.

Have these been tested?

We don't have any experience in the test setup for this repository, so please let us know what needs to be done!

Types of changes

[ ] Bug fix (non-breaking change which fixes an issue)
[x] New content (non-breaking change which adds new content)
[ ] Modified content (non-breaking change which modifies existing content)
[ ] Breaking change (fix or feature that would cause existing functionality to not work as expected)

Future TO-DOs

The provided example can be improved by small adaptations if the range of labEquipment of the type LabProtocol is extended to include PropertyValue.
In the future, it would be desirable to specifically link to LabProcess or LabProtocol objects from a Dataset. Currently, there is no semantically sound property that describes this relation (only about is a rough match).
Is the specification repository self-contained or do we need an additional PR for the website?

sgenehr commented 10 months ago

I really agree with the distinction of LabProtocol and LabProcess and would like to contribute to this draft. This distinction is important for the description of the research process as a whole from a prospective and retrospective point of view. For that matter, a type of LabProcessDocumentation could be useful for the description of the content of research notebooks, whereas the LabProcess type could cover the granular description of singular activities performed in the lab.

HLWeil commented 10 months ago

Hey @sgenehr, thank you for your input on this topic. I'm not quite sure I correctly got your proposition. Would the LabProcessDocumentation be a collection of LabProcess, enriched by some properties to add additional descriptions?

sgenehr commented 10 months ago

Yes, from the current proposal, I would understand a LabProcess to be a singular event or activity performed during an experiment. A lab experiment consists of multiple activities performed in a sequence. The sequence should be provided by a HowTo, such as the LabProtocol. This way, each LabProcess can be executed according to a HowToStep mentioned in the protocol. A LabProcessDocumentation or maybe LabExperimentDocumentation would represent a report on the sequence of activities that were actually performed, so it should also allow for activities that were not foreseen by the LabProtocol, i.e. Processes that were not executed according to a HowToStep.

A Documentation could also specify which items were partOf the experiment, while each LabProcess specifies the input and output relations for that particular activity.

HLWeil commented 10 months ago

Ah okay, thanks for the clarification!

We did not intend the LabProcess to necessarily be a singular event. It can be a string of events, which also reflects it referencing the LabProtocol which is a HowTo and therefore (as you stated) consists of (possibly multiple) HowToSteps.

In the annotation of research experiments we have in mind (based on ISA), the most important thing is to string inputs and outputs together. By this, the experimental flow can be traced back from the final data to the source material it all started from, with the necessary annotations like biological species and experimental factors being placed where they are actually applied. The LabProcess is of course this glue between connecting input and output. So, we propose it not to be atomic in regards to the steps performed, but in regards to when you can actually name an input and output for a series of steps.

As a short example, applying an RNA extraction protocol might be brought up. This of course consists of many steps, namely switching between differnt buffers, applying reagents, centrifuging etc.. You start with your input sample of cells and end up with your output RNA exract. This could be annotated using a single LabProcess.

Of course this still does not represent a full Lab Experiment, so there might be some overarching type used that functions as a contextualizing collection for the LabProcess. Also, it's a good point, that the execution of a LabProtocol does not necessarily go as planned. But then maybe you could also consider the LabProtocol you executed be a new version or other new LabProtocol that references the original one.

ljgarcia commented 10 months ago

Hi @gtsueng I think this PR is ready to go, could you please check it from DDE point of view? Thanks

BioSchemas / specifications

LabProcess Type Draft #669