Closed heather-i closed 1 year ago
There was a conversation on this topic in today's ODM Implementation meeting. Multiple people present agreed that multiple quality flags could be relevant for a single measure (e.g., control curve quality concerns + inhibition quality concerns). Though qPCR was at the forefront of this discussion, it is easy to imagine that this could be the case for other types of measures as well (sequencing, sampling, etc.)
There are multiple ways to deal with a measure having many quality concerns. Some of them would require modifying the structure of the ODM slightly.
qualityFlag
fields (qualityFlag1
, qualityFlag2
, ...) in the measures
and samples
tables.
Advantages: This would be very straightforward to implement, and it wouldn't require the creation of new tables. Drawbacks: 1) It widens the report tables by several fields 2) It might seem like they should be interpreted as having a hierarchy (is qualityFlag1
more important than qualityFlag3
because it is ranked first?) 3) It opens the door to adding lists in many other places in the ODM, potentially mucking up the overall structure.The linkages between quality flags and measures / samples could be done in (at least) 2 ways:
This option:
qualityFlag
field from the measures
and samples
tables
*Creates a table with the following fields:
This option takes advantage from the fact that quality flags are pre-determined by the dictionary. Therefore, all the possible combinations of quality flags that could be reported can be inferred by the contents of the dictionary itself. Say, in measure x
's qualitySet, that there are three possible flags: A
, B
, and C
. We thus immediately know that the quality concerns for a measurement of x
can only be one of {[], [A], [B], [C], [A,B], [A, C], [B, C], [A, B, C]}
. A new table (say, qFCombinations
) could be automatically be generated from the contents of each qualitySet
, with each combination having its unique id.
Then , the measure
and sample
tables only need to replace their qualityFlag
field with a qfCombinationID
field to link the measure / sample to the correct combination of flags.
Advantages It maintains an explicit link between the measures ans samples tables with the quality measures, and it allows users to keep filling all their values only in the samples and measures tables.
Disadvantages The number of permutations grows geometrically with each new flag in a set, which could become unweildy over time, and it adds another step to the dictionary generation (i.e., every time a quality flag is added to a qualityflagSet, new permutations must also be added the the qfCombination table.
These options aren't exhaustive, but hopefully they get the conversation rolling on the best way forward :)
Another aspect of quality that was mentioned in the meeting was how to report LOD / LOQ for measures.
The dictionary is flexible in this regard, so it would probably be good to agree on a common way of doing things.
Here are all the options I can think of:
loq
andlod
values as rows in the `measures table.loq
andlod
as methodSteps in the MethodSet table. Then, link the measures that use that assay with the correct methodSetIDloq
andlod
values as rows in the `measures table AND link them to the relevant lab measurement with a measureSetID. Thus, by looking through all the measures of a given measureSetID, we would find for each qPCR measurement:
measures
table for lod
and another for loq
lod
and loq
could be added as fields there.The issue I see with lod and loq being in the quality table is that then it's hard to link the value to the right unit. If lod and loq are proporties in the measure, then the unit can be assumed to be the same as for the reported value. If lod and loq are their own rows, aither in MethodSteps or measures, they can have their own unit without having to worry about the unit used by specific measurements.
Thank you for summarizing the discussion from the ODM implementation meeting and clearly describing the advantages and drawbacks of each option!
I will break up my thoughts into Ontario Data Template/MECP-specific notes and general ODM notes:
Option B would be my preference.
Pros:
Cons:
If I understand correctly, the ODM is set up to be formatted for a number of different users and so the WWMeasure table can be used for data produced from labs measuring any biologic, toxin, or other health risk, using any number of techniques or assays. Therefore, there is the necessity to make it both very flexible to accommodate all possible uses/users as well as customizable to it is able to capture highly precise data for each use/user. This may be a very naïve understanding of the ODM so please take the following comments lightly.
I propose to give users (MECP in my case) the ability to select a customized WWMeasure table based on the assayMethod. For example, if the assay method is RT-qPCR, the WWMeasure table will have quality flag columns for this technique but if the assay method is sequencing, the WWMeasure table would be altered to accommodate that data type.
Advantages: ensures data from any assay is being recorded with all caveats/flags so that only the highest quality data is being used for interpretation; increases user friendliness when reporting as the columns are understood by the labs/persons producing the data from each type of assay.
Drawbacks: I would anticipate that this would be a lot of work to coordinate this and do not want to take that lightly.
I believe this should be part of the Quality flag column (as it currently is in the ODT) as these values can change over time as improvements/changes to assays occur, so it is easier to note if each of the values reported in the WWMeasure table are below the LOD or LOQ at the time of reporting.
Note: I am also in the process of communicating this to Vince Pileggi and Sherif Hegazy (MECP; points of contact for the Ontario Data Template) so again please ignore if these changes/thoughts are not relevant to the ODM itself.
Just a clarification that this issue discussion is referencing versions 1.1 and 2.0. @heather-i references are mostly about v1.1. @jeandavidt references are mostly about version 2.0.
Version 2.0 expands the dictionary and the model quite a bit. The name change from WWMeasure
(wastewater measure) to measures
reflects that measures can be for water, air or surface and more robustly include population measures (testing, hospitalization, etc.).
In version 2.0:
1) LOD and LOQ change from headers within the measures
tables to what is described by @jeandavidt in this issue (a row in the measure table linked to measures using measureSetID
or within the methods
table). There are a few reasons for this change. Most notably, LOD
and LOQ
are relevant to specific measures, such as PCR measures. There is a considerable increase in measures, such as chemical and physical properties, where LOD and LOQ don't apply.
2) Quality sets (qualitySet
) are introduced. Currently, four quality sets are described, but more can be added at any time: Generic Quality Flag Set; PCR Quality Set; Sample Quality Set; Sequencing Quality Set. Each measure can have a quality set.
As an aside, in Version 2.0, there are also aggregation sets (aggSet
), and unit sets (unitSet
). So, each measure has an aggregation set, unit set and quality set. A unit set for temperature (degrees celsius) is different from a unit set for SARS-CoV-2 N1 gene region detection by PCR (gene copies per l, gene copies per copies of PPMoV, etc.)
3) Measure sets (measureSetID
) are introduced. Measures sets allow groups of measures to be associated with each other. There are several use cases for measure sets, but they are generic and flexible. Associating LOQ and LOD to a group of measures was one identified use case. Other use cases include:
Measures and samples can also be grouped, but there are slightly different considerations. Samples have the provision for having parent, child, combined samples, etc. Methods have methodSteps that can be grouped, and then groups can be combined. For example, there could be several RNA extraction steps that can be grouped together and then added with other groups of steps for, say, concentration, PCR, etc. to form an overall assay method.
Remember that we’ll want our quality measures to work in both ‘long’ and ‘wide’ data formats. I don’t foresee major issues with any proposed solutions, but there are a few considerations and implementation issues. We’ll likely want the core ODM development team to review how to generate long names before we sign off on a reporting approach.
Long data is the main ODM data format, but version 2 provides better support for wide tables with an explicit formula for generating wide names. Variable names for wide tables can get very long because the names are a connotation of attributes. See below. This means that we’ll want short part names for quality measures.
The figure below is preliminary and not quite up-to-date. Regardless the figure informs the general approach.
For Option B, what is the implementation? Do we need key:value pairs? Maybe even key:value:unit (for numerical quality measures)? @mathew-thomson @heather-i
1) Key:value pairs: qf1_partID
, qf1_value
, qf2_partID
, qf2_value
.
qf1_partID
= J, qf1_value
= TRUE.
2) Use the partID as the name, and then the entry is the value. qf1_J
qf1_J
= True.
3) Have the quality measure as the value and assume TRUE. qf1
qf1
= J
The above approaches also need to work for quality measures that are not Boolean but real numbers: measures such as LOQ, concentration estimate, etc. Key value pairs for these measures, and also implementation 2. The value measures need an accompanying unit and maybe also an aggregation.
A challenge for implementation 2 is a proliferation of qf1_ variables as headers. Remember that there are measures other than PCR. Currently, in version 2, there are 22 quality measures, which would mean adding 22 variable headers to the measuring table -- of which most are not relevant for any one measure. Now you've got a wide-table format instead of a long-table format.
So for version 2.0, we propose going for option C:
Addressing this problem raised the issue of measure sets vs sets of quality flags. The question was: since we are now linking several quality flags to a measure, isn't this the same as creating a measure set, but for quality? I looked into this, and they turn out to be different. The difference is in the number of links the different entities can have together:
So:
But a thing we might want to do is to allow one measure to belong in many sets (say, a set of replicates and a set of all the measures that were done with the same calibration curve). For that, we need to turn the relationship between measures and measureSets from n:1 to n:n
n:n linkages require a lookup table. Thus, the setup would be the following:
I put these changes into the ERD for review and discussion
All points by @jeandavidt look good.
The first task we need to complete is how to store data that address required use cases for quality measures. What @jeandavidt suggested does address use cases that have been discussed - in particular, the main issue thread and the ability to store multiple quality measures.
The second task is to recommend data easy data collection for common uses. @heather-i has a good point that Option B is easy to implement and understand for many users and use cases.
We will need to decide whether we want the 'reportable' attribute and how that would be used. From the discussions:
reportable
is an important attribute that we want to keep. This attribute is widely requested and it is also a helpful flag that there is or should be a corresponding entry in the qualitySet table.I tend to support updating the measureSetReport table to allow n:n, but we haven't received many requests that require this more robust structure. However, the more robust structure:
Regardless, the idea to add additional descriptors makes sense (e.g., qualitycontrolSet or ReplicateSet or whatever) and those descriptors could be added to the existing measureSet table.
I am not sold on MeasureSetHasMeasures. I find the 'has' tables are conceptual great, but many people are not familiar with them or their application.
Thank you @jeandavidt for laying things out so eloquently, and to build on @DougManuel 's point re: reportable
- one thing that was brought up in our discussion was to continue to use reportable
in the measures
table and add it to the samples
table. To add potential levels of nuance, while still maintaining a final ease of interpretation, a severity
column in the proposed qualityReports
table with a traffic-light-style tier system. This wouldn't replace reportable
, and would be optional, but provides some additional detail on how important a given flag is outside the final yes-no reportable decision.
I also am somewhat conservative about blowing up the measureSets
structure to allow for n:n relationships with measures, but if the labs are supportive of this kind of infrastructure then I think it would be great to build it in before we launch v2.0.
Version 2 will have a specific quality table that can record multiple quality measures for any sample or measure.
In order to further automate our data management and reporting systems to be compatible with the Ontario Data Template based on the Open Data Model it would be helpful to split out the Quality Flag column in the WWMeasure tab to more easily support when there is more than one quality flag for a data point. According to the Protocol Evaluations for qPCR Performance used by the MECP and OCWA (attached), there are the following qualifiers: B - background contamination observed (greater than 5 Cq away from samples; indicating that it would not affect quantification but it is present) FI - failed inhibition AI - addressed inhibition ND - non-detect J - concentration estimate extrapolated based on extending experiment-specific standard curve to the y-intercept UJ - "Trace" amplification of target; concentration estimate extrapolated based on extending experiment-specific standard curve beyond the y-intercept
It would be easiest if all of these had their own column that could be true/false or if they were at least grouped into the type of quality flag. Ex. a column flags that correspond to concentration (ND, J, and UJ) since each sample can only be one of these and flags that correspond to inhibition (FI and AI) since each sample can only have one of these.
I know this is very Ontario-specific so I understand if it is not possible to make these changes but if it is possible to include them and have them be optional for users of the Open Data Model who do not need these separated, then that would be wonderful.
Thanks!
20220128_ProtocolEvaluationsqPCRPerformance_January2022.pdf