Closed siuc-nate closed 1 year ago
We have viable use cases that require making the information explicit when, due to suppresion policy, there is 0 as a quantiative value. We need to be able to make it explicit by indicating the reason for the 0 is not that there wasn't any data. Rather, the 0 is defined as via dataWithholding:DataSuppressed withqdata:DataProfile.
We need this change made promptly as we have multiple state's publishing QData now.
Come to think of it, that might be why the domain was initially set to QuantitativeValue et al, so that there would be no need for a fake/placeholder value like 0. That would allow for an array of QuantitativeValues where some are suppressed and others aren't.
If that's the case, then the domain does still need to be fixed - the namespaces are currently wrong.
We should probably take a step back and consider the problem/solution more holistically.
How about:
This allows us to say (in DataProfile) that any QuantitativeValues hanging off of that DataProfile with hasNoValue: true
were suppressed, without polluting QuantitativeValue itself with a property that only makes sense in a QData context. We would instead have a far more generic property in the QuantitativeValue that could be used for other purposes outside of this one where there is a need to assert a lack of value.
Example:
{
"@type": "qdata:DataSetProfile",
"qdata:dataSetTimePeriod": [
{
"@type": "qdata:DataSetTimeFrame",
"qdata:dataAttributes": [
{
"@type": "qdata:DataProfile",
"qdata:dataWithholdingType": [ "qdata:DataSuppressed" ],
"qdata:holdersInSet": [
{
"@type": "schema:QuantitativeValue",
"schema:value": 100
}
],
"qdata:relatedEmployment": [
{
"@type": "schema:QuantitativeValue",
"schema:value" 20
}
],
"qdata:insufficientEmploymentCriteria": [
{
"@type": "schema:QuantitativeValue",
"qdata:hasNoValue": true
}
]
}
]
}
]
}
Even though there is a difference between:
I think it is clear enough from the context (in a DataProfile with the presence of qdata:DataSuppressed) that the 1-n QuantitativeValues with affirmative assertions of no value were the ones suppressed, whereas the properties that are not used simply have no data.
But I wanted to raise the above items in order to avoid creating this property now, only to later determine (for semantic reasons) that we shouldn't use it and instead should create some other property that means "there is a value but it is not available", which puts us close to the original structure of QData that started this issue where that meaning is conveyed by putting the qdata:dataWithholdingType directly in the QuantitativeValue and similar classes.
Short of actually putting qdata:dataWithholdingType directly in QuantitativeValue et al, we could potentially create such a property that might still be generic enough to be usable in QuantitativeValue outside of QData, e.g.:
{
"@type": "qdata:DataSetProfile",
"qdata:dataSetTimePeriod": [
{
"@type": "qdata:DataSetTimeFrame",
"qdata:dataAttributes": [
{
"@type": "qdata:DataProfile",
"qdata:dataWithholdingType": [ "qdata:DataSuppressed" ],
"qdata:holdersInSet": [
{
"@type": "schema:QuantitativeValue",
"schema:value": 100
}
],
"qdata:relatedEmployment": [
{
"@type": "schema:QuantitativeValue",
"schema:value" 20
}
],
"qdata:insufficientEmploymentCriteria": [
{
"@type": "schema:QuantitativeValue",
"qdata:valueUnavailable": true
}
]
}
]
}
]
}
In which case we would not have a need for the proposed property, qdata:hasNoValue, at least not at this time.
@siuc-nate it was determined that in a case where the data is suppresed, the data for that data profile needs to explicetly identify there is a 0 as a direct result of the suppresion policy. Are you working that out separately? Otherwise, an example is needed indicating dataWithholding:DataSuppressed.
The proposal for a new property is to enable explicitly indicating when there is no value/the value is unavailable without using 0, because 0 can also be a legitimate value in some cases.
Yes. Please don't use 0 for "value not available" as it really means value is 0.
It is a common and well-known problem with open-world models to know whether the lack of a statement means there is no data anywhere or that is just isn't available in that particular dataset. If the range of value is confined to numbers then it is especially difficult (i.e. you cannot put in text values like "suppressed" or "unknown" or concepts like ex:suppressed
).
I don't see anything wrong with not stopping short of "actually putting qdata:dataWithholdingType directly in QuantitativeValue et al"? I think this makes sense beyond the QData context, and it seems no worse than qdata:valueUnavailable being directly QuantitativeValue.
The above examples (which are maybe not fleshed out enough) would then look like this:
{
"@type": "qdata:DataSetProfile",
"qdata:dataSetTimePeriod": [
{
"@type": "qdata:DataSetTimeFrame",
"qdata:dataAttributes": [
{
"@type": "qdata:DataProfile",
"qdata:holdersInSet": [
{
"@type": "schema:QuantitativeValue",
"schema:value": 100
}
],
"qdata:relatedEmployment": [
{
"@type": "schema:QuantitativeValue",
"schema:value" 20
}
],
"qdata:insufficientEmploymentCriteria": [
{
"@type": "schema:QuantitativeValue",
"qdata:dataWithholdingType": [ "qdata:DataSuppressed" ]
}
]
}
]
}
]
}
It still "feels" like something that is too QData-specific to belong in a generic class like that, but maybe I'm overthinking it.
We need this change made promptly as we have multiple state's publishing QData now.
@jeannekitchens We should probably take a few good examples of this data and try each of these approaches with it, to see which one works the best. It should only take a few really rich examples.
Looks good to me.
I think qdata:dataWithholdingType
may be useful in contexts other than qdata so I'm not worried about seeing it in a non-qdata class.
Slightly more fleshed out examples, showing multiple suppressed items within a single DataProfile: Via the first alternative:
{
"@type": "qdata:DataSetProfile",
"qdata:dataSetTimePeriod": [
{
"@type": "qdata:DataSetTimeFrame",
"qdata:dataAttributes": [
{
"@type": "qdata:DataProfile",
"qdata:dataWithholdingType": [ "qdata:DataSuppressed" ],
"qdata:employmentRate": [
{
"@type": "schema:QuantitativeValue",
"schema:description": { "en": "Percent of graduates who were employed in the state at 10 years after graduation." },
"qdata:dataUnavailable": true
}
],
"qdata:earningsAmount": [
{
"@type": "schema:MonetaryAmount",
"schema:description": { "en": "Median earnings of graduates who were employed in the state at 10 years after graduation in inflation-adjusted 2021 dollars." },
"schema:currency": "USD",
"qdata:dataUnavailable": true
}
]
}
]
}
]
}
Via the second alternative:
{
"@type": "qdata:DataSetProfile",
"qdata:dataSetTimePeriod": [
{
"@type": "qdata:DataSetTimeFrame",
"qdata:dataAttributes": [
{
"@type": "qdata:DataProfile",
"qdata:employmentRate": [
{
"@type": "schema:QuantitativeValue",
"schema:description": { "en": "Percent of graduates who were employed in the state at 10 years after graduation." },
"qdata:dataWithholdingType": [ "qdata:DataSuppressed" ]
}
],
"qdata:earningsAmount": [
{
"@type": "schema:MonetaryAmount",
"schema:description": { "en": "Median earnings of graduates who were employed in the state at 10 years after graduation in inflation-adjusted 2021 dollars." },
"schema:currency": "USD",
"qdata:dataWithholdingType": [ "qdata:DataSuppressed" ]
}
]
}
]
}
]
}
The first one would make sense in some cases, but would require a second DataProfile if there were two different kinds of withholding. It also requires the use of both qdata:dataWithholdingType and qdata:dataUnavailable, with both being interpreted together (either property on its own wouldn't make as much sense).
The second one localizes the withholding type to the relevant QuantitativeValue and has fewer moving parts overall, so while I still think it's kind of weirdly QData-specific, it may still win out on other practical advantages.
Okay, I think I'm convinced, it may ultimately make more sense to keep it in the QuantitativeValue . @jeannekitchens @mparsons-ce Thoughts?
Suppression is a very specific reason for not providing data. The data does exist. It could not be made public because the quantity of people is too low. I think both examples are applicable. 1. There can be cases where there is no data. 2. It is a common method to suppress data per defined policy even though there is data. For the PASSHE scenario, every 0 is due to data being suppressed. Thereby, that data needs to indicate it is suppressed.
The first one would make sense in some cases, but would require a second DataProfile if there were two different kinds of withholding. It also requires the use of both qdata:dataWithholdingType and qdata:dataUnavailable, with both being interpreted together (either property on its own wouldn't make as much sense).
One more nudge in the direction of keeping it in the QuantitativeValue: what happens in option 1 if qdata:employmentRate
and qdata:earningsAmount
are withheld / unavailable for different reasons?
Per our 2023-9-15 meeting: We will go with the last example, keeping the property in the intended domains as shown in the post at the top of this thread.
I have made the necessary corrections in the schema and history tracking.
Per internal discussions:
I believe we should change the domain of qdata:dataWithholdingType from the (erroneous/non-existent) classes it currently references, to qdata:DataProfile, as this seems like a better fit for it given its purpose and given the nature of Data Profile.
Remove:
Add: