Support for <pre><code> basic pastebins

jberger commented 3 years ago

A user told me that they have a home-rolled pastebin that they use that doesn't get embedded. While we can't expect to support every little bin, I think the situation as it exists could be somewhat improved.

The fallback in basic for generic pastetbins uses oddly-specific selectors, which I'm assuming conform to some specific paste sites. Notably the fact that it uses direct descendent selectors body > pre and body > div > pre is somewhat limiting. However in the specific case I'm looking at it wouldn't be insurmountable.

The site would work with a selector body > div > pre > code. That selector is again rather specific, it would be odd to require a div tag like that, it could be body pre > code to be a bit more accepting, though it would have to be attempted before body > div > pre or it would never get there. Alternatively it could optionally look for a code tag inside the resulting selector and walk into it if present. Or it could use all_text rather than text to get the tag contents. Are any of these suggestions acceptable?

jberger commented 3 years ago

Without changing the selectors, LE could support <pre><code> by changing

https://github.com/convos-chat/linkembedder/blob/ab51106c4f5d5dd1b54b59851a59cbfc4af4e124/lib/LinkEmbedder/Link/Basic.pm#L38

to

$self->{paste} = $tmp->all_text;

or by adding something like

if (my $code = $tmp->at('code')) { $tmp = $code }

above it. I'd be happy to put together a PR for either solution if you'd accept them @jhthorsen . Just let me know.

(adding a selector for body pre > code would be fine by me too remember).

jhthorsen commented 3 years ago

I've been giving this some more thought, and I think we should "break" LinkEmbedder by adding the fix you suggest. By "breaking" I mean that it will probably find some non-pastebin sites and render them as a pastebin. I'm thinking about blogs with code examples and such. I do think breaking it, and then later on adding better tests would be better than holding this change back.

Feel free to open a PR under these conditions (if possible)

Add a test for a working site
Document in the code what the selector is intended to match, so we don't accidentally break the change you want.

About the "document in the code" point: I should've done that myself, but I haven't so now we're in this situation 😞

jhthorsen commented 3 years ago

Seems like all_text has already been implemented, so I'm going to close this issue.

Let me know if I forgot something @jberger.

jhthorsen commented 3 years ago

...or just open a PR with a sloppy "pre" lookup, and I'll merge it.

jhthorsen commented 3 years ago

I'm reopening this, since https://svelte.dev/docs is a case where it gets expanded to a \

tag when it shouldn't imo. Might also discuss if https://svelte.dev/docs#4_Prefix_stores_with_$_to_access_their_values should get rendered as a code example.

Could this be fixed simply by checking if there's a "bunch" of \

tags? Where bunch is more than 1 or 2..?

And even this issue: Looks like LinkEmbedder finds the code link in https://github.com/convos-chat/linkembedder/issues/24#issuecomment-812595658 - Thought that's probably Github.pm specific.

convos-chat / linkembedder

Support for <pre><code> basic pastebins #24