IQSS / dataverse

Open source research data repository software
http://dataverse.org
Other
873 stars 484 forks source link

Using displayFormat for displaying, indexing and searching compound field values #7856

Open poikilotherm opened 3 years ago

poikilotherm commented 3 years ago

Context

Problem

  1. Currently, there is no possibility to define a composed view of a compound metadata field to influence the order and/or combination of the subfields or adding HTML tags.
  2. There is no option to search for composed values made up from the fields within the compound field via the name of the compound field. (example: you cannot search for author, only the authorName, etc)

We want users to input metadata with as much detail and structure as possible (controlling vocabularies, ...). Yet recipients think in bigger context/pictures, expecting composed views (both text or images like in #6289) and easier to memorize search tags (author, not authorName).

Examples

Proposal

  1. Reuse metadata block definition field displayFormat for compound fields, too.
  2. The given display format may contain references to subfields via their name, following the present style of using #subFieldName.
  3. The display format may make use of HTML tags to format the output
  4. Omitting a display Format makes the UI fallback to current behaviour
  5. The display format of the compound field is also used to index the value as a searchable text within Solr.
  6. Indexing removes any HTML tags.
  7. This is unrelated to API ingest or metadata exports (the composed field views are not exported or usable via the API to retain backward compatibility).

Extensions

pdurbin commented 3 years ago

@poikilotherm it sounds like we'd be able to clean up how Topic Classification looks. Right now there are long URLs at the end like this...

Screen Shot 2021-05-07 at 11 07 08 AM

... but with your proposal, we could make the terms into hyperlinks, like this:

Screen Shot 2021-05-07 at 11 06 37 AM

(This example is from https://doi.org/10.7910/DVN/TQBAEE )

qqmyers commented 3 years ago

FWIW: As discussed in DCM2021, the new external vocab support as currently defined (#7946) would not use this feature, as it has the service-specific JavaScript handle display. That nominally allows the JavaScript to reorganize compound fields into nicer html as discussed here (or do #6289), and to do that when there's only a single term URI field as well. (It also allows the JavaScript to get i18n versions of the term/vocab names involved (the internal CVV field definitions can have translations and #7923 will get them displayed, but external vocabularies don't store i18n values the same way, and with the option to just use a single term URI field instead of a compound field, having the JavaScript manage i18n as well helps keep things simple.))

poikilotherm commented 2 years ago

@qqmyers now that external vocab support is in place, I have a question for you.

Is the JavaScript thing also saving a searchable composed string in Solr or is this HTML display related only? This would also be a thing for metadata retrieval via API, where no Javascript can be involved.

Asking this because I know you wanted to do caching of fetched data, so maybe the composed UI string is stored for a composed field, too.

If this is not the case, do you think it would be a feasible extension to implement this?

poikilotherm commented 2 years ago

@doigl @pdurbin Rethinking the search part, it might be a good idea to add a new column exportFormat. Some metadata fields make profit from allowing HTML tags for UI candy, but this is bad for searches.

A formating as cleartext might not only benefit compound fields, but also others that would benefit from an enhanced UI view and a clean, maybe machine interoperable export value.

Once we have an exportFormat, this would be nice for metadata export formats, fetching via JSON API and searches.

(@pdurbin when attacking this, would it be a good time to transform the TSV to JSON/YAML to make it more readable and extensible? Would prefer YAML because comments.)

qqmyers commented 2 years ago

The current implementation indexes the URI and leverages the Javascript to allow you to type the term name/leverage the autocomplete functionality to select a search term. So the user is typing in their language/using the term name, but the search itself is for the URI, which matches the URI in the index.