FreshRSS / Extensions

A repository containing all the official FreshRSS extensions
GNU Affero General Public License v3.0
320 stars 52 forks source link

[xExtension-ImageProxy] Duplicate images in articles #206

Open denis-dysen opened 4 months ago

denis-dysen commented 4 months ago

With the proxy enabled, every article that has an image in it, has its first (or only) image duplicated, with the duplicate image being located at the end of the article, and checking the image URLs reveals the duplicate is at the end is the original non-proxied link, while all other images links are proxied.

Frenzie commented 4 months ago

I suspect you're talking about the same thing as https://github.com/FreshRSS/FreshRSS/issues/4999

math-GH commented 3 months ago

I investigated it a bit.

Here are some insights, that roots are in ImageProxy extension: It wraps the images in a new HTML document "envelope" that completely destroys the HTML:

<div class="text"> 
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><img>........</body></html>
<figure class="enclosure">

(check it with the "show source" in your browser, not with the inspect tool)

In the last new versions of FreshRSS a new feature was implemented that prevent showing duplicated images from the articles as attachment based on the image source. The imageProxy extension exchanges the src of the image to the proxy URL, so the mechanisms cannot detect this image as already shown, so the image will be displayed again as attachement with the origin source.

If you manipulated https://github.com/FreshRSS/Extensions/blob/189406e1603b63d60423c0f68798122215ce17cf/xExtension-ImageProxy/extension.php#L129 to $img->setAttribute('data-proxy-src', $newSrc); than there are no duplicated images anymore and the data-proxy-src is set correctly.

(I do not have any solution for it right now)

(P.S.: To have a better testing: enable the Proxy HTTPS checkbox in the extension settings)

Frenzie commented 3 months ago

It wraps the images in a new HTML document "envelope" that completely destroys the HTML:

That's unavoidable. But it's supposed to do something like $dom->getElementsByTagName('body')->item(0).

jonsmy commented 2 months ago

Have this problem as well. In the default FreshRSS release feed I get two avatar images: One from my proxy, and another directly linking to github. :/