downforacross / downforacross.com

Web frontend for downforacross.com -- continuation of stevenhao/crosswordsio
https://downforacrosscom.downforacross1.now.sh
MIT License
222 stars 92 forks source link

Markdown/html parsing issue in clues #220

Open frozenpandaman opened 2 years ago

frozenpandaman commented 2 years ago

Not sure if the NYT has changed their format or something, where the underlying representation of puzzles is written using Markdown styling rather than HTML, but here is a clue on the NYT site:

Screen Shot 2021-12-08 at 22 44 58

And how it shows up on downforacross:

Screen Shot 2021-12-08 at 22 45 29
bdenney commented 2 years ago

@frozenpandaman which date was this clue from?

frozenpandaman commented 2 years ago

@bdenney The date I posted it, December 9, 2021 (I just googled it to double-check 😉)

bdenney commented 2 years ago

This is odd, but it appears to be fixed on latest main?

Screen Shot 2022-02-01 at 2 02 02 PM

I do see a few HTML encoding issues on other clues though, so perhaps I'll try and address those 😄

stevenhao commented 2 years ago

thanks @bdenney !

sometimes there are different versions of the same puzzle uploaded to the repo -- usually because the puz files were prepared by different software. i think it's pretty safe to detect html escape sequences like & and &lquot; and such.

autodetecting and parsing markdown might be a bit more tricrky; i could see it backfiring in other cases (i.e. where markdown syntax was not intended but a markdown parser picks it up inadvertently). worth a shot though

frozenpandaman commented 2 years ago

Cool, thanks for the info! And glad it's fixed (huh! sorry, don't know the exact puzzle/game URL)

I'll close this for now then but @bdenney, of course do feel free to open a new issue for other parsing stuff. :D

stevenhao commented 2 years ago

i think bdenney's proposal to fix the html parsing is good; i updated this issue to include html in scope too

frozenpandaman commented 1 year ago

also probably related: #243

cc @bdenney