geonetwork / core-geonetwork

GeoNetwork is a catalog application to manage spatially referenced resources. It provides powerful metadata editing and search functions as well as an interactive web map viewer. It is currently used in numerous Spatial Data Infrastructure initiatives across the world.
http://geonetwork-opensource.org/
GNU General Public License v2.0
413 stars 487 forks source link

URLs ending in 1,2,3 or 5 within text blocks are truncated, making the link invalid #8226

Open duncanw opened 2 months ago

duncanw commented 2 months ago

Describe the bug If a multi-line text field (e.g the abstract) in a metadata record contains a URL that ends in 1,2,3 or 5 (e.g https://doi.org/10.21420/TTQ0-SR11), the digit(s) on the end of the URL are omitted when the record is viewed.

To Reproduce Steps to reproduce the behavior:

  1. Go to Contribute > Add new record
  2. Select the template and group then click Create
  3. In a multi-line text field, add some text containing such a URL, e.g "This is a link: https://doi.org/10.21420/TTQ0-SR11 [new line]...and it is borked"
  4. Fill in all the mandatory fields
  5. Save and view the new record
  6. The link in the multi-line text field is broken, in this example the href value will be https://doi.org/10.21420/TTQ0-SR

Expected behavior The full, correct link should be rendered and clickable in the record view.

Screenshots In the edit page: image

In the view page: image

Inspecting the view page link: image

Log file N/A

Desktop (please complete the following information):

Additional context N/A

duncanw commented 2 months ago

My guess is this is caused by incorrect XML character encoding in this regex in core-geonetwork > web/src/main/webapp/xslt/common/utility-tpl.xsl: <xsl:analyze-string select="$string" regex="(http|https|ftp)://[^\s()&gt;&lt;]+[^\s``!()\[\]&amp;#123;&amp;#125;;:'&apos;&quot;.,&gt;&lt;?«»“”‘’]">