While investigating CERES issue #233, @patrickmj noticed the example in the issue had a name with an a acute (á). I attempted to rule out the á as the cause, but it looks like the á character might actually be preventing records with that character from being returned in search and the facets in the DRS and in CERES.
Search ETDs for "complex systems" (this will reduce the results so that the name with an á floats to the top 10 of the creator facet list)
Click the "Limit your search" option, then select "more Creators" from the facets list
Click "Barabási, Albert-László" from the modal
According to the facet list, 9 records should be returned when selecting "Barabási, Albert-László", but none are:
á is probably not the only culprit, as í seems to cause the same issue for the name "Barreto, Amílcar Antonio" (Search for "Barreto" and try to limit to their name variation with accents. 11 results should be returned, but none are).
Replacing the á value with á does nothing - the character does not display in the preview and is completely removed from the record when saving. I remember a conversation ages ago about avoiding certain markup avoid security issues, but to my knowledge character encoding is still allowed.
So, there might be two issues here:
Facet results can't be retrieved when the facet value contains an encoded character (or, a character that should be encoded.
Valid character encodings are not being saved when entered in the XML record.
While investigating CERES issue #233, @patrickmj noticed the example in the issue had a name with an a acute (á). I attempted to rule out the á as the cause, but it looks like the á character might actually be preventing records with that character from being returned in search and the facets in the DRS and in CERES.
Here are steps to replicate:
According to the facet list, 9 records should be returned when selecting "Barabási, Albert-László", but none are:
á is probably not the only culprit, as í seems to cause the same issue for the name "Barreto, Amílcar Antonio" (Search for "Barreto" and try to limit to their name variation with accents. 11 results should be returned, but none are).
Here are core file records for each of the examples: Barreto, Amílcar Antonio: http://hdl.handle.net/2047/d20004884 Barabási, Albert-László: http://hdl.handle.net/2047/d20002667
For both of these records, the acute letters are not encoded - they're entered in the XML as the display value with the acute:
Replacing the á value with
á
does nothing - the character does not display in the preview and is completely removed from the record when saving. I remember a conversation ages ago about avoiding certain markup avoid security issues, but to my knowledge character encoding is still allowed.So, there might be two issues here: