Insprill / intellectual

Alternate frontend for Genius focused on privacy and simplicity
https://intellectual.insprill.net/
GNU Affero General Public License v3.0
35 stars 4 forks source link

Images on artist's page aren't proxied through server #21

Closed ButteredCats closed 10 months ago

ButteredCats commented 10 months ago

Is there an existing issue for this?

Are you using the latest version of Intellectual?

Describe the bug

Images on an artist's page aren't proxied through the server, causing the client to directly reach out to Genius.

This also has the side effect of making these images violate the default content security policy, making them appear as a broken link unless the server host overrides the security policy.

Anything else?

Example with default content security policy: https://intellectual.insprill.net/artist?path=artists/Joji Example with edited content security policy: https://intellectual.catsarch.com/artist?path=artists/Joji

Insprill commented 10 months ago

We should switch to using the Markdown format instead of the HTML one. It unfortunately doesn't contain images, so it will fix this issue in that right, but it will be more reliable and secure than trusting whatever arbitrary HTML they send.

ButteredCats commented 10 months ago

I just gave a quick test on my staging server with replacing the URL through NGINX sub filters and that seems to work perfectly. Not sure if you have an NGINX setup to test it with yourself but this is all it took:

sub_filter_once off;
sub_filter "https://images.genius.com/" "https://$host/api/image?url=https://images.genius.com/";

and also setting proxy_set_header Accept-Encoding ""; so Intellectual doesn't gzip it's response.

The image links directly to genius are https://images.genius.com whereas all the already proxied ones are https%3A//images.genius.com so they're easy to differentiate with a simple rule.

Would it not be possible to do something similar to this before serving the page on Intellectual's side?

Insprill commented 10 months ago

Yes, it would be possible to do that, and I had a working version of that, but there are also many styling issues (#22) that'll be harder to fix. I'm not sure how many components can be the responses; if it's just a few styles we have to recreate, that's not bad, but I don't want it to turn into an endless chase of copying styles to fix broken elements.

ButteredCats commented 10 months ago

That's a good point. So far the only issues I've managed to come across is this one, #22, and I just noticed that the text below the picture in https://intellectual.catsarch.com/artist?path=/artists/Tom-cardy is also out of place to a degree. I took a look at some other artists I listen to as well but couldn't find any ones with problems. Given that, I hope it won't be an endless goose chase and it just revolves around the images/embeds Genius has.

I did a little testing with CSS and adding this to artist.css makes it so the image doesn't go beyond the container and also applies a bit of styling to the text that would normally be beside the image to make it look a little more cohesive and less forced in:

.embedly_preview-thumb img {
    width: 100%;
    height: auto;
}

.embedly_preview-description {
    margin-top: 10px;
}

.embedly_preview-dash, .embedly_preview-provider {
    display: none;
}

.gray_container {
    background: gray;
}

The gray_container class could definitely use a better background color, but none of the predefined ones from the github dark theme were a good fit.

Also the parentheses around the image and it's text in that link are also out of place on Genius itself so that's not Intellectual's fault.

Of course this approach also hinges on Genius not changing anything about how their classes are named or how their site is laid out.

This looks to be a decision between being pretty/fully featured vs being more resistant to breakage and keeping complexity down by using markdown like you said. Given that Genius is primarily a site for lyrics and not artist info, my personal belief is dropping the images and embeds inside of the artist's description for the reasons above isn't too big of a deal.

Insprill commented 10 months ago

I suppose we can try it out, and if it ends up being too much work, we can scrap it and switch to markdown. It'd be nice if we could recreate the card look that's on Tom Cardy's Genius page. Feel free to make a PR for this; otherwise, I'll get to it in a couple of days.

ButteredCats commented 10 months ago

Just opened #23 that recreates that card look for the embed. However, in doing that I noticed that it's all Genius links in the artist's description that don't get rewritten.

For example the embed on https://intellectual.catsarch.com/artist?path=/artists/Tom-cardy where the giant picture is still is a direct link to Genius. Not as big a deal as the images since it doesn't automatically cause the client to reach out to Genius but still worth noting.

I'm going to take a look and see if I can figure out a solution for that but given that I'm unfamiliar with rust and a relatively novice coder I'm unsure if I'll be able to do anything about it on my own.

Insprill commented 10 months ago

I had time tonight, so I merged #23 and added some dirty logic to rewrite the Genius links in artist descriptions to local Intellectual ones (fc27976f81d829ad9db28726e8b4057047e91a1b). I tested it with Joji's artist page since there are a lot of links there, and it's working well in my testing. If you know of other artists with links in their descriptions, can you test it and let me know how it works?

ButteredCats commented 10 months ago

Just tested Papa Roach, Panic at the Disco, Mother Mother, and The Offspring. First three work great! Every Genius link was converted successfully, even with the massive amount of links in Panic at the Disco's description.

However, for whatever reason The Offspring's links are formatted as https://https.genius.com. This is the case on Genius too, presumably because someone screwed up. Genius does have a wildcard for their domain though, so it's a valid and working link nonetheless. Seems like an edge case but I felt it was worth mentioning that weird stuff like that won't be converted. For the replace you could use this regex: https:\/\/(.*)genius\.com\/ to match all subdomains aswell as just https://genius.com/ itself which should solve weird cases like that.

Insprill commented 10 months ago

I had to tweak that regex a little to make it work, but I pushed 90f077d9a4f083a21458707fe60ee9619c9b9a9a which is now working for me with all the artists you listed.

ButteredCats commented 10 months ago

Tested some more artists and couldn't find anything out of place. Looks like we're all good!

I'll go ahead and close this.