dcmi / dctap

DC Tabular Application Profile
https://dcmi.github.io/dctap/
33 stars 10 forks source link

Picklists and other multiple entries #63

Closed kcoyle closed 2 years ago

kcoyle commented 2 years ago

At this time we have a valueConstraintType of picklist. This has some redundancy with the idea that a manifest or configuration file would indicate what character is being used to separate multiple values in a cell. We toyed with the idea of setting a default value for the separator (either space or comma) but did not include that in the primer. The section from a primer draft on multiple values has been copied to the cookbook.

At the January 6, 2022 meeting it was suggested that picklist is not needed because it will be defined as a list based on its separator character. It was also said that the semantics of picklist are the same as the semantics of any other multiple value cell: the meaning of the separator is "OR" and the general meaning of a list is "one of the following."

Please add comments to argue the pros and cons of using picklist. I expect this discussion to be complex because it involves not only the valueConstraint of picklist but also the question of multiple values in other columns.

kcoyle commented 2 years ago

Summary of discussion here:

PROS

(get rid of picklist)

CONS

(keep picklist)

Other Considerations and Complications

kcoyle commented 2 years ago

As I recall, one of the reasons for using "picklist" came out of the complexity of having comma-separated choices in a CSV file. We have avoided designating a more easily distinguishable separator, leaving it to the TAP developers and their applications, but the comma really does create problems. We need a way to tell the difference between:

We didn't want to tell people to put single strings with commas in quotes. (Note that some of the spreadsheet programs do that for you.) How is an application to know whether to treat the first one as two items?

This doesn't of course solve this situation:

We cannot guarantee that every application reading in a TAP will have a config file that indicates the separator. Perhaps we need to give actual guidance on this even though it appears to go beyond our simple set of rules for TAP.

kcoyle commented 2 years ago

The discussion at the January 6, 2022 meeting pointed out that there can be multiples in any columns. Also, the valueConstraintTypes of languageTag and IRIstem are often themselves "picklists" of values.

kcoyle commented 2 years ago

Also at that discussion Nishad argued for the use of the pipe as the default separator, in part because it is compatible with regex. I note that it also solves the "comma problem" - it allows the use of normal punctuation within the items in the list (since normal language doesn't use the pipe as punctuation).

Phil used carriage returns as a separator in a TAP. This also solves the "comma problem". Phil stated that different ways to separate multiple values seemed natural in different columns, so it may not be reasonable to have a single separator for the entire table.

We also talked in an early meeting about the difference between keywords or identifiers and natural language strings. Keywords are self-contained, like IRI or bnode. They neither have spaces nor punctuation. Identifiers could be properties with prefixes ("dct:title") or IRI stems ("http://id.loc.gov"). They have punctuation, but rarely commas, and they do not have spaces. These could be written with various separators without ambiguity: (I'm use the square bracket to mean the cell in the table.)

kcoyle commented 2 years ago

keeping "picklist" as a good human interface but acknowledge that lists of various types can be created as patterns if that is what is preferred by modelers.