Sage-Bionetworks / schematic

Package for biomedical data model and metadata ingress management
https://schematicpy.readthedocs.io/en/stable/cli_reference.html
MIT License
22 stars 25 forks source link

allow a default value to be specified for a metadata attribute #349

Open BrunoGrandePhD opened 3 years ago

BrunoGrandePhD commented 3 years ago

@smrgit commented on Wed Apr 29 2020

for some of the imaging metadata attributes, although they need to be required, in many instances the appropriate value might be easily guessed at (eg FALSE, or 1), so it might be useful to allow the metadata spec (and the ingress code) to be able to handle a default setting


@BrunoGrandePhD commented on Tue Nov 10 2020

@smrgit: Just to be sure, you're suggesting that some columns in the template manifest be pre-populated with a default value? I just want to be sure this is distinct from what is described in Sage-Bionetworks/schematic#311. Here, the data model developer (is there a better term?) would identify attributes with a value that will be most common (e.g. in over 70-80% of entries). Did I get that right?


@smrgit commented on Tue Nov 10 2020

what I was initially thinking (but a lot of time has passed, so perhaps things have changed) was that if a field was required, but had been specified to have a default value, that if that field was left empty/null it would default to the previously specified value -- but it could also be pre-populated in the template (and maybe also added to the tool-tip?)


@milen-sage commented on Tue Nov 10 2020

Yes, agreed with Sheila that this is a bit easier feature to implement than https://github.com/Sage-Bionetworks/schematic/issues/311 or https://github.com/sage-bionetworks/schematic/issues/179). We could handle it by introducing new column - 'Default Value' - in the schema sheet definition (and a corresponding new schema object property something along the lines of 'sms:DefaultValue')?


@BrunoGrandePhD commented on Tue Nov 10 2020

I would rather pre-populate the manifest with default values rather than defaulting to them during validation because the former is explicit and more transparent. A warning could be displayed during validation to indicate that default values were used for certain attributes, but I still prefer the more upfront approach.

Regarding the implementation, for any value that has a list of Valid Values, we could pre-populate the field with the first option. I don't know if we should be worried with users being lazy and not updating default values if they're incorrect. If this is a real concern, then a separate column would be better because it would flag which attribute should have a pre-populated default value and it would also allow for default values to be used for attributes that don't use Valid Values.


@milen-sage commented on Tue Nov 10 2020

I would rather pre-populate the manifest with default values rather than defaulting to them during validation because the former is explicit and more transparent. A warning could be displayed during validation to indicate that default values were used for certain attributes, but I still prefer the more upfront approach.

I agree with this^ approach.

Regarding the implementation, for any value that has a list of Valid Values, we could pre-populate the field with the first option. I don't know if we should be worried with users being lazy and not updating default values if they're incorrect. If this is a real concern, then a separate column would be better because it would flag which attribute should have a pre-populated default value and it would also allow for default values to be used for attributes that don't use Valid Values.

In this implementation^, how do we account for default values for attributes that don't have specified list of valid values? E.g. an attribute that can take floating point numbers, but the default value is say "0.0"? Otherwise I like this approach since it is pretty lightweight.


@BrunoGrandePhD commented on Wed Nov 11 2020

How often would a default value be used for non-categorical attributes? I suspect it's a small enough use case (if at all existing) that we can probably go ahead with the lightweight solution (no new column), and if the need arises later, we can discuss adding a new Default column.

@milen-sage: Are you okay with moving this issue to the schematic repo?


@milen-sage commented on Wed Nov 11 2020

Yes - it definitely belongs in the schematic repo. (Just beware that the issue labels and estimates won't be transferred, so you might need to re-add them).

BrunoGrandePhD commented 3 years ago

The more I think about it, the less I like my lightweight solution. I'm just worried that those pre-populated values will be missed if we automatically do it for all attributes that have valid values. Maybe an optional Default column isn't so bad after all. The column can be omitted from the templates people start off with to avoid overwhelming new users.

BrunoGrandePhD commented 3 years ago

We have identified a use case for this in the imCORE project. Because most studies generate certain data types through core labs, it might make sense to expose the values that are consistent across datasets as defaults.