gohugoio / hugo

The world’s fastest framework for building websites.
https://gohugo.io
Apache License 2.0
76.22k stars 7.55k forks source link

Create template function to remove non-printable characters #11255

Open Winterhuman opened 1 year ago

Winterhuman commented 1 year ago

What version of Hugo are you using (hugo version)?

$ hugo version
hugo v0.112.3+extended linux/amd64 BuildDate=unknown

Does this issue reproduce with the latest release?

Yes.

Issue description

As far as I can tell, no combination of plainify, safeHTML, markdownify, or html{Une,E}scape can fully remove soft hyphens, and at best converts them to \uad, the only exception I found being anchorize which does successfully remove them.

I'm not sure which functions out of the above should be able to remove soft hyphens or not (my guess would be plainify, since it's meant to "strip HTML tags"), but for now, using replace .Var "­" "" is the only way other than anchorize that I've found.

jmooring commented 1 year ago
{{ "_­_" | warnf "%[1]v (%[1]T)" }} --> ­ (string)
{{ "_­_" | htmlUnescape | warnf "%[1]v (%[1]T)" }}  --> _ ­_ (string)

If you need additional assistance, please use the forum (https://discourse.gohugo.io/) for questions and troubleshooting. We prefer to use GitHub for verified bugs and vetted enhancements. Thanks.

jmooring commented 1 year ago

OK, I see it now. Reopening.

jmooring commented 1 year ago

First, the plainify function removes HTML tags (e.g., <strong>), not HTML entities (e.g., &amp;). It is doing the right thing.

Second, the htmlUnescape function "returns the given string with HTML escape codes un-escaped." This function is also doing the right thing.

So, there's no bug here, but possibly an enhancement: create a template function that removes non-printable characters. This gets a bit tricky because sometimes you may want to remove them, and at other times replace them with a space (e.g., tab should be replaced by space, but zero-width space should not). I think these would have to be special-cased.

For now I think you have three options:

  1. Use the replace function
  2. Use the replaceRE function
  3. Create a partial function that contains a slice of HTML entities to remove or replace.
github-actions[bot] commented 4 months ago

This issue has been automatically marked as stale because it has not had recent activity. The resources of the Hugo team are limited, and so we are asking for your help. If this is a bug and you can still reproduce this error on the master branch, please reply with all of the information you have about it in order to keep the issue open. If this is a feature request, and you feel that it is still relevant and valuable, please tell us why. This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

Winterhuman commented 4 months ago

Still relevant