Our CKAN implementation is being used to index datasets across a variety of domains. We have a schema that defines a core set of fields that all datasets need to specify. To support flexibility for dataset authors to add additional key-value metadata that is outside of the required core set of fields, our schema also has a field called custom_fields that uses repeating_subfields to allow users to specify additional key-value pairs to associate with their dataset. The key subfield has a validator that makes sure that the key's name meets certain requirements:
- field_name: custom_fields
label: Custom Fields
repeating_label: Custom Field
repeating_subfields:
- field_name: key
label: Key
validators: key_validator
required: true
- field_name: value
label: Value
required: true
We want to block users from specifying the same key value in multiple entries in custom_fields. The obvious way to do this is to define a validator function, keys_are_unique(), that can be applied to custom_fields to verify that the keys are unique.
- field_name: custom_fields
label: Custom Fields
validators: keys_are_unique
repeating_label: Custom Field
repeating_subfields:
- field_name: key
label: Key
validators: key_validator
required: true
- field_name: value
label: Value
required: true
After trying this & spending some time in my debugger, I discovered that the extension initially saves the top-level field's validator, but then replaces it with the subfields' validator(s). When this happens, the data type of the validators list also changes from list[str] to dict. I haven't had time yet to dig into why/how this works in the grand scheme of CKAN dataset creation, but that's probably my next step.
One example of this behavior is in the _field_validators() function in ckanext/scheming/plugins.py:
As a user, this behavior is unexpected -- I would expect that validators can be applied to both the top-level & sub-level fields. I think that it's worth updating the documentation for repeating_subfields and/or validators to explain that if a field has repeating_subfields, then validators for that field are ignored in favor of validators defined for the subfields.
I also have two specific questions, any insight would be appreciated:
1) We are trying to allow users to add an arbitrary number of "extra" key-value pairs to a dataset that is defined by a ckanext-scheming schema. Is our current approach (custom_fields field with repeating_subfields) the "correct" way to do what we are trying to do, or is there another method that is better supported, either by ckanext-scheming or by CKAN itself?
2) Assuming that what we are doing is correct, roughly how difficult would it be to modify ckanext-scheming to support validation of both the subfields and the top-level field? Any tips on what we would need to do to support that? It almost seems like this would require changes to the way the data is modelled, which I assume would be nontrivial.
Environment
CKAN version: 2.10.1 ckanext-scheming version: release-3.0.0
Description
Our CKAN implementation is being used to index datasets across a variety of domains. We have a schema that defines a core set of fields that all datasets need to specify. To support flexibility for dataset authors to add additional key-value metadata that is outside of the required core set of fields, our schema also has a field called
custom_fields
that usesrepeating_subfields
to allow users to specify additional key-value pairs to associate with their dataset. Thekey
subfield has a validator that makes sure that the key's name meets certain requirements:We want to block users from specifying the same
key
value in multiple entries incustom_fields
. The obvious way to do this is to define a validator function,keys_are_unique()
, that can be applied tocustom_fields
to verify that the keys are unique.After trying this & spending some time in my debugger, I discovered that the extension initially saves the top-level field's validator, but then replaces it with the subfields' validator(s). When this happens, the data type of the validators list also changes from
list[str]
todict
. I haven't had time yet to dig into why/how this works in the grand scheme of CKAN dataset creation, but that's probably my next step.One example of this behavior is in the
_field_validators()
function inckanext/scheming/plugins.py
:https://github.com/ckan/ckanext-scheming/blob/8646a9dce79aa0b5a46274271ca6330dc0870b92/ckanext/scheming/plugins.py#L582-L608
As a user, this behavior is unexpected -- I would expect that validators can be applied to both the top-level & sub-level fields. I think that it's worth updating the documentation for
repeating_subfields
and/orvalidators
to explain that if a field hasrepeating_subfields
, thenvalidators
for that field are ignored in favor of validators defined for the subfields.I also have two specific questions, any insight would be appreciated:
1) We are trying to allow users to add an arbitrary number of "extra" key-value pairs to a dataset that is defined by a
ckanext-scheming
schema. Is our current approach (custom_fields
field withrepeating_subfields
) the "correct" way to do what we are trying to do, or is there another method that is better supported, either byckanext-scheming
or by CKAN itself?2) Assuming that what we are doing is correct, roughly how difficult would it be to modify
ckanext-scheming
to support validation of both the subfields and the top-level field? Any tips on what we would need to do to support that? It almost seems like this would require changes to the way the data is modelled, which I assume would be nontrivial.