JupiterBroadcasting / jupiterbroadcasting.com

JupiterBroadcasting.com, hugo-based and community-driven!
https://jupiterbroadcasting.com
99 stars 49 forks source link

episode link markdown not rendered correctly #221

Open gerbrent opened 2 years ago

gerbrent commented 2 years ago

See https://jupiterbroadcasting.net/show/linux-action-news/250/

Behaviour

One of the links on this particular episode renders as: [Bug #1974196 “Installing libudev1 on a new Jammy installation"](https://bugs.launchpad.net/ubuntu/+source/apt/+bug/1974196 “Bug #1974196 “Installing libudev1 on a new Jammy installation”")

Expected

The link should be formatted as the others, namely: Bug #1974196 "Installing libudev1 on a new Jammy installation"

I blame all the quotes.

gerbrent commented 2 years ago

image

elreydetoda commented 2 years ago

That's an interesting one...🤔

xPMo commented 2 years ago

Dug into it, it's an html2text bug with " in the link title. Minimal reproduction:

$ printf '<a title="&quot;" href="/">foo</a>' | html2text
[foo](/ """)
elreydetoda commented 2 years ago

So @xPMo , can you think of any to fix this besides just stripping out the html encoded version of the quote?

I guess we could convert it to be the html encoded version of the single quote &#39; (src) instead?

gerbrent commented 2 years ago

We could also simply.... not use quotes if that's easier.

Could scrape the scrapped data for quoted links and scrape them out of there. Unless that idea is scrap.

xPMo commented 2 years ago

I made an issue upstream, and might make a PR too.

Could scrape the scrapped data for quoted links and scrape them out of there. Unless that idea is scrap.

I'd be reticent to hack a post-conversion fix into the scraper. I have an idea that might work but it might cause breakage elsewhere. I don't know how the scraper interacts with git; if we just fix it manually, will it overwrite it?

If not, an exeptions list for the scraper might make it easier to address future one-offs.