GSA / data.gov

Main repository for the data.gov service
https://data.gov
Other
547 stars 87 forks source link

HTML meta tag content should be escaped #4793

Closed btylerburton closed 1 week ago

btylerburton commented 2 weeks ago

User Story

In order to ensure a dataset's markup is valid, datagovteam wants to escape the content put into the tags.

Results of HTML validator scan should report no errors: https://validator.w3.org/nu/?doc=https%3A%2F%2Fcatalog.data.gov%2Fdataset%2Fdob-now-build-elevator-permit-applications

Screenshot 2024-06-12 at 11 45 17 AM

Acceptance Criteria

[ACs should be clearly demoable/verifiable whenever possible. Try specifying them using BDD.]

Background

Bad html markup is causing issues with google search console.

Example: https://catalog.data.gov/dataset/dob-now-build-elevator-permit-applications

Screenshot 2024-06-12 at 11 41 06 AM

Security Considerations (required)

[Any security concerns that might be implicated in the change. "None" is OK, just be explicit here!]

Sketch

btylerburton commented 1 week ago

URLs have been sanitized, but string literals still need to be escaped...

<meta property="og:description" content="{{ h.literal.escape(notes) }}">

produces:

<meta property="og:description" content="The "Watershed Boundary Dataset (WBD)" from The National Map (TNM) defines the perimeter of drainage areas formed by the terrain and other landscape characteristics. The...">

CKAN defines a string escape helper below but it seems to have no effect.

https://docs.ckan.org/en/2.10/theming/template-helper-functions.html?highlight=helper#ckan.lib.helpers.literal