Validators ignored for fields with repeating subfields

Environment

CKAN version: 2.10.1 ckanext-scheming version: release-3.0.0

Description

Our CKAN implementation is being used to index datasets across a variety of domains. We have a schema that defines a core set of fields that all datasets need to specify. To support flexibility for dataset authors to add additional key-value metadata that is outside of the required core set of fields, our schema also has a field called custom_fields that uses repeating_subfields to allow users to specify additional key-value pairs to associate with their dataset. The key subfield has a validator that makes sure that the key's name meets certain requirements:

- field_name: custom_fields
  label: Custom Fields
  repeating_label: Custom Field
  repeating_subfields:
  - field_name: key
    label: Key
    validators: key_validator
    required: true
  - field_name: value
    label: Value
    required: true

We want to block users from specifying the same key value in multiple entries in custom_fields. The obvious way to do this is to define a validator function, keys_are_unique(), that can be applied to custom_fields to verify that the keys are unique.

- field_name: custom_fields
  label: Custom Fields
  validators: keys_are_unique
  repeating_label: Custom Field
  repeating_subfields:
  - field_name: key
    label: Key
    validators: key_validator
    required: true
  - field_name: value
    label: Value
    required: true

After trying this & spending some time in my debugger, I discovered that the extension initially saves the top-level field's validator, but then replaces it with the subfields' validator(s). When this happens, the data type of the validators list also changes from list[str] to dict. I haven't had time yet to dig into why/how this works in the grand scheme of CKAN dataset creation, but that's probably my next step.

One example of this behavior is in the _field_validators() function in ckanext/scheming/plugins.py:

https://github.com/ckan/ckanext-scheming/blob/8646a9dce79aa0b5a46274271ca6330dc0870b92/ckanext/scheming/plugins.py#L582-L608

As a user, this behavior is unexpected -- I would expect that validators can be applied to both the top-level & sub-level fields. I think that it's worth updating the documentation for repeating_subfields and/or validators to explain that if a field has repeating_subfields, then validators for that field are ignored in favor of validators defined for the subfields.

I also have two specific questions, any insight would be appreciated:

1) We are trying to allow users to add an arbitrary number of "extra" key-value pairs to a dataset that is defined by a ckanext-scheming schema. Is our current approach (custom_fields field with repeating_subfields) the "correct" way to do what we are trying to do, or is there another method that is better supported, either by ckanext-scheming or by CKAN itself?

2) Assuming that what we are doing is correct, roughly how difficult would it be to modify ckanext-scheming to support validation of both the subfields and the top-level field? Any tips on what we would need to do to support that? It almost seems like this would require changes to the way the data is modelled, which I assume would be nontrivial.

ckan / ckanext-scheming

Validators ignored for fields with repeating subfields #394

Environment

Description