AtlasOfLivingAustralia / la-pipelines

Living Atlas Pipelines extensions
3 stars 4 forks source link

`recordedBy` values contain HTML content #522

Open nickdos opened 2 years ago

nickdos commented 2 years ago

Related to #521

Viewing facet results for recordedBy field for Questagame records show that we are indexing HTML content in this field, which results in unexpected output when sorting by value (vs count). The IU is stripping out HTML so the user is only seeing the text portion but the ordering of results is still seeing the HTML content and thus all Questagame collector names appear jumbled up together because they all start with <a href.

{
  "i18nCode": "collector.<a href='https://bee.questagame.com/#/profile/12478?questagame_user_id=12478'>frond | questagame.com</a>",
  "count": 41,
  "label": "<a href='https://bee.questagame.com/#/profile/12478?questagame_user_id=12478'>frond | questagame.com</a>",
  "fq": "collector:\"<a href='https://bee.questagame.com/#/profile/12478?questagame_user_id=12478'>frond | questagame.com</a>\""
}

Fix is to strip out HTML using a SOLR filter or via code (SOLR probably has this functionality built-in so suggest not reinventing the wheel here) for the recordedBy field only and rely on the raw_recordedBy to display the HTML version if required.