frictionlessdata / datapackage

Data Package is a standard consisting of a set of simple yet extensible specifications to describe datasets, data files and tabular data. It is a data definition language (DDL) and data API that facilitates findability, accessibility, interoperability, and reusability (FAIR) of data.
https://datapackage.org
The Unlicense
481 stars 107 forks source link

Support multiselect survey item types via `categorical` properties on `list` field types #940

Open khusmann opened 2 weeks ago

khusmann commented 2 weeks ago

"Multiple select" items are an extremely common type of survey question type in the social / medical / bio-behavioral / etc sciences. For example:

Which fruits do you like? (Select all that apply)

a. Apple b. Orange c. Banana d. Kiwi

Data from such items are often exported from survey software as a delimited list in a field. Qualtrics will export data like this (in fact, it uses this delimited list form by default), and I believe REDCap has an option for it as well (@pschumm please correct me if I'm wrong!). For example, an exported csv from the above item might look something like this:

id,multiselectField
0,"Apple"
1,"Apple,Orange"
2,"Apple,Banana,Kiwi"

For representing these item types in frictionless, I'd like to propose we allow categorical properties to be defined on list item types (where itemType is either integer or string). This way, the above multiple select item field could be represented as follows:

{
  "name": "multiselectField",
  "type": "list",
  "itemType": "string",
  "categories": ["Apple", "Orange", "Banana", "Kiwi"]
}

Or in a coded representation:

{
  "name": "multiselectField",
  "type": "list",
  "itemType": "integer",
  "categories": [
    { "value": 0, "label": "Apple"},
    { "value": 1, "label": "Orange"},
    { "value": 2, "label": "Banana"},
    { "value": 3, "label": "Kiwi"}
  ]
}

Thoughts from other folks that frequently use categorical items? @pschumm @fomcl @djvanderlaan