Closed Themroc closed 4 years ago
Some feeds lack a
<channel> <image>
tag, but do provide a<channel><link>
, where usually a favicon can be found. Would be nice if that'd be used.
I believe we actually prefer the favicon of <link>
to <image>
:
Now, unfortunately, many pages still use the traditional .ico format, which i guess would cause lots of trouble. So, lets convert it to .png.
We are already doing so using https://github.com/lordelph/icofileloader:
Here
https://www.phpclasses.org/package/3906-PHP-Read-and-write-images-from-ICO-files.html
is a class to do that, but it comes with a special license.
We actually used floIcon before but had to switch to icofileloader due to floIcon’s non-free license and unavailability on composer.
Do you have a specific feed that does not have an icon displayed as expected?
Wow! Great, that that should have been solved already. However, this
https://www.photovoltaik.eu/gentner.dll/RSS?MID=30021&RSS_STYLEID=574806
feed shows up like this:
How to get output from
\F3::get('logger')->debug()
?
You can set the logger_level
in config.ini
. You can also change DEBUG
to 1
in common.php
to record HTTP request bodies.
Thanx! Gonna find out, where it gets stuck...
Ok, their homepage says
< link rel="SHORTCUT ICON" href="https://www.photovoltaik.eu/gentner/photovoltaik/images/favicon2013.ico" / >
but the log says
2019-08-06 16:58:12] selfoss.ERROR: failed to retrieve image http://www.photovoltaik.eu/favicon.ico, {"exception":"[object] (GuzzleHttp\Exception\ClientException(code: 404): Cl (truncated...)
Makes me guess the code doesn't actually read the homepage. A file_get_contents() and preg_match() or 2 should fix that...
Nope, all done already! Double wow! Only it doesn't ignore upper/lower case...
You are right, I must have broken it in 6e0494c229803674499c634a3e3b2a4906add78f
:) Looks like the fewer php extensions you use, the better off you are...
Now, i found this https://stackoverflow.com/questions/24183701/xpath-lowercase-is-there-xpath-function-to-do-this but dunno how to implement it in php... You like to do it, or should i try?
I found https://stackoverflow.com/questions/2893551/case-insensitive-matching-in-xpath and even more interesting https://medium.com/simplepie-ng/php-domdocument-xpath-1-0-case-insensitivity-and-performance-ad962b98e71c
If you could take a jab at it, I would be thankful.
By the way, I also noticed that the feed displays umlauts incorrectly. Apparently, https://github.com/fossar/guzzle-transcoder which we use does not handle */*+xml
MIME types. I am working on a fix for that.
Thanks for the links, really interesting. However, it seems to me as if performance doesn't really matter anymore once you use xpath for this. I've done some tests with the old parseShortcutIcon() (the one before 6e0494c), the current xpath-one and this
function parseShortcutIcon ($html) {
if (preg_match('!<link\b[^>]*\brel=("|\')(apple-touch-icon|shortcut icon|icon)\1[^>]*>!iU', $html, $links) < 1)
return false;
if (preg_match('!\shref=("|\')(.+)\1!iU', $links[0], $href) < 1)
return false;
return $href[2];
}
new one. With 200 runs on 15 pages i get:
old parseShortcutIcon(): Time: 0.971s 12 favicons ok
current parseShortcutIcon(): Time: 45.901s 14 favicons ok
new parseShortcutIcon(): Time: 0.144s 15 favicons ok
Pages were
$urls= [
'https://taz.de',
'https://www.sueddeutsche.de',
'https://www.neues-deutschland.de',
'https://www.klimareporter.de',
'https://fridaysforfuture.de',
'https://www.spiegel.de',
'https://medium.com',
'https://www.php.net',
'https://stackoverflow.com',
'https://github.com',
'http://www.chris-floyd.com',
'https://www.antiwar.com',
'https://www.wsws.org',
'https://caitlinjohnstone.com',
'https://www.photovoltaik.eu',
];
What do you think?
That was with PHP 7.0.33. PHP 5.6.30 gave the same outcome, but a bit different times:
Time: 5.673s Time: 45.919s Time: 0.341s
Hmm, that is quite significant performance regression, I guess we have to revert to the regex parsing :disappointed:
I am not sure about the single regex, though, as that approach will use the first matched (possibly a shortcut icon
), instead of the apple-touch-icon
that is likely a higher resolution. Maybe a compromise with one regex for apple-touch-icon
icon and a second one for (shortcut )?icon
would work?
For completeness, here are my results of the benchmark:
string(18) "parseShortcutIcon1"
float(0.80524301528931)
string(18) "parseShortcutIcon2"
float(22.608789920807)
string(18) "parseShortcutIcon3"
float(0.12911605834961)
AFAICT, those icons are displayed with only 16x16 Pixels anyway, so the 1st one should be sufficient. But it would be easy to use preg_match_all(). This one
function parseShortcutIcon2($html) {
$end= strripos($html, '</head>');
if ($end > 1)
$html= substr($html, 0, $end);
if (preg_match_all('!<link\b[^>]*\brel=("|\')(apple-touch-icon|shortcut icon|icon)\1[^>]*>!iU', $html, $links) < 1)
return null;
//TODO: sort array descending by sizes-attribute
foreach ($links[0] as $l) {
if (preg_match('!\shref=("|\')(.+)\1!iU', $l, $href))
return $href[2];
}
return null;
}
takes 2.373s (old one: 0.975s). Still way faster than xpath.
With php 5 it is
parseShortcutIcon0: Time: 5.651s parseShortcutIcon2: Time: 7.298s
Same league, i'd say.
A more elegant (and fully working) variant of the old one:
function parseShortcutIcon4($html) {
if ((preg_match('/<link\b[^>]*\brel=("|\')apple-touch-icon\1[^>]*>/Ui', $html, $match)
|| preg_match('/<link\b[^>]*\brel=("|\')shortcut icon\1[^>]*>/Ui', $html, $match)
|| preg_match('/<link\b[^>]*\brel=("|\')icon\1[^>]*>/Ui', $html, $match)
)
&& preg_match('/href=("|\')(.+)\1/Ui', $match[0], $result)
)
return $result[2];
return null;
}
gives:
** parseShortcutIcon4: Time: 0.540s Result for 'https://taz.de': '/apple-icon-57x57.png'. Result for 'https://www.sueddeutsche.de': 'https://www.sueddeutsche.de/pagelayout/assets/img/favicon/apple-touch-icon.png'. Result for 'https://www.neues-deutschland.de': '/apple-touch-icon.png'. Result for 'https://www.klimareporter.de': '/templates/klimareporter/favicon.ico'. Result for 'https://fridaysforfuture.de': 'https://fridaysforfuture.de/wp-content/uploads/2019/04/cropped-icon-32x32.png'. Result for 'https://www.spiegel.de': '/static/sys/v12/logo/favicon/touch-icon-iphone.png'. Result for 'https://medium.com': 'https://cdn-images-1.medium.com/fit/c/304/304/18I-HPL0bfoIzGied-dzOvA.png'. Result for 'https://www.php.net': 'https://www.php.net/favicon.ico'. Result for 'https://stackoverflow.com': 'https://cdn.sstatic.net/Sites/stackoverflow/img/favicon.ico?v=4f32ecc8f43d'. Result for 'https://github.com': 'https://github.githubassets.com/favicon.ico'. Result for 'http://www.chris-floyd.com': '/favicon.ico'. Result for 'https://www.antiwar.com': 'https://dgxhtav2e25a8.cloudfront.net/favicon.ico'. Result for 'https://www.wsws.org': '/img/apple-touch-icon.png'. Result for 'https://caitlinjohnstone.com': 'https://i1.wp.com/caitlinjohnstone.com/wp-content/uploads/2017/07/cropped-caitlinpic1.jpg?fit=32%2C32&ssl=1'. Result for 'https://www.photovoltaik.eu': 'https://www.photovoltaik.eu/gentner/photovoltaik/images/favicon2013.ico'.
Even better:
function parseShortcutIcon5($html) {
return (
(preg_match('/<link\b[^>]*\brel=("|\')apple-touch-icon\1[^>]*>/Ui', $html, $match)
|| preg_match('/<link\b[^>]*\brel=("|\')shortcut icon\1[^>]*>/Ui', $html, $match)
|| preg_match('/<link\b[^>]*\brel=("|\')icon\1[^>]*>/Ui', $html, $match) )
&& preg_match('/href=("|\')(.+)\1/Ui', $match[0], $result)
)
? $result[2]
: null;
}
Same performance as 4.
AFAICT, those icons are displayed with only 16x16 Pixels anyway
That depends on a client and a device. For example on HiDPI displays in web client (e.g. retina iPhone) 32×32 px would be used.
Same league, i'd say.
Yeah, I would say that in majority of cases, the you will have to search the whole page, as most of websites do not have all three variants.
Here are the stats for some extra variants, as well as the usage of each link type on the sample sites:
https://taz.de 9 0 4
https://www.sueddeutsche.de 1 1 2
https://www.neues-deutschland.de 9 1 5
https://www.klimareporter.de 0 1 0
https://fridaysforfuture.de 0 0 2
https://www.spiegel.de 7 1 0
https://medium.com 4 0 1
https://www.php.net 0 1 0
https://stackoverflow.com 0 1 0
https://github.com 0 0 1
http://www.chris-floyd.com 0 1 1
https://www.antiwar.com 0 1 0
https://www.wsws.org 1 0 1
https://caitlinjohnstone.com 0 0 0
https://www.photovoltaik.eu 0 1 0
parseShortcutIcon1: 0.7996039390564 s
parseShortcutIcon2: 23.398230075836 s
parseShortcutIcon3: 0.12495994567871 s
parseShortcutIcon4: 0.82246804237366 s
parseShortcutIcon5: 22.572862863541 s
parseShortcutIcon6: 0.6842520236969 s
parseShortcutIcon7: 0.53298807144165 s
parseShortcutIcon8: 0.53591418266296 s
It looks like we will need to use preg_match_all
after all, and select the best size manually:
<link rel="apple-touch-icon" sizes="180x180" href="https://www.sueddeutsche.de/pagelayout/assets/img/favicon/apple-touch-icon.png" />
<link rel="icon" type="image/png" sizes="32x32" href="https://www.sueddeutsche.de/pagelayout/assets/img/favicon/favicon-32x32.png" />
<link rel="icon" type="image/png" sizes="16x16" href="https://www.sueddeutsche.de/pagelayout/assets/img/favicon/favicon-16x16.png" />
<link rel="manifest" href="https://www.sueddeutsche.de/pagelayout/assets/manifest.json" />
<link rel="mask-icon" href="https://www.sueddeutsche.de/pagelayout/assets/img/favicon/safari-pinned-tab.svg" color="#ffffff" />
<link rel="shortcut icon" href="https://www.sueddeutsche.de/pagelayout/assets/img/favicon/favicon.ico" />
Probably an ideal solution from speed & correctness perspective would be SAX-like HTML parser.
$inHead = false;
$bestLink = null;
function handleSaxEvent($event) {
global $inHead, $bestLink;
if ($event->type === ELEMENT_START && $event->elemName === 'head') {
$inHead = true;
} else if ($inHead && $event->type === ELEMENT_START && $event->elemName === 'link') {
if ($event->args better that $bestLink) {
// update $bestLink
}
} else if ($event->type === ELEMENT_END && $event->elemName === 'head') {
return false; // stop processing
}
}
saxParser($body, 'handleSaxEvent');
Not sure if there is a fast PHP implementation, though. Does https://www.php.net/manual/en/class.xmlreader.php support HTML? Otherwise we would need to use https://packagist.org/packages/php-kit/html5-tools or https://packagist.org/packages/diggin/diggin-htmlsax.
For now, the preg_match_all
variant should be sufficient, unless someone wants to investigate these options.
I doubt that any real HTML parser (even a one in a c-library) will be much faster than parseShortcutIcon4. Also, there's a ton of old, weird and buggy pages out there, regex handles them all. Do those parsers? They only introduce unnecessary complexity, IMO.
So, i'll test some more websites with that preg_match_all(). Old and obscure ones preferred, with a few right-wingers and coal lobbyists for balance...
You want the biggest icon available, right?
My concern was primarily things like <!-- <link rel="icon"…
and CDATA
. You cannot handle it with pure regexes. But perhaps that is a corner case that we can get away with not handling.
Yeah, the biggest icon should be okay. We can come up with more advanced heuristics like “smallest icon ≥ 64×64, or biggest icon when no such is available” later, for now the parity with the current state is enough. We should probably filter out SVG icons ([@type=image/svg+xml]
) as we cannot currently handle them.
HTML comments should be easy, no? Something like
$html= preg_replace('/<!--.*-->/sU', '', $html)
only that the 's'-modifier in php never works...
Anyway, i've added a bunch of urls and made preg_match_all- version that picks the biggest icon if "sizes"-attribute is present. If not, the apple one is preferred. Assumption is that if "sizes" really contains what its name suggests, that the biggest one comes first. But luckily, until now, there was always just one size in there. Problems:
Sry, i'm too dumb to upload a php file here... So:
<?php
# Careful! This writes a lot of files to curdir
$urls= [
'https://taz.de',
'https://www.sueddeutsche.de',
'https://www.neues-deutschland.de',
'https://www.klimareporter.de',
'https://fridaysforfuture.de',
'https://www.spiegel.de',
'https://medium.com',
'https://www.php.net',
'https://stackoverflow.com',
/*
focin http item on https page again... too lazy to fix now...
'http://www.chris-floyd.com',
'http://en.people.cn',
'http://www.xinhuanet.com',
'http://www.daelim.co.kr',
'https://www.aboveandbeyond.nu',
*/
/*
read error
'https://www.almasdarnews.com',
'https://consortiumnews.com',
'https://rassemblementnational.fr',
'https://www.africanews.com',
'https://en.mercopress.com',
'https://www.news.com.au',
'https://www.kcna.kp',
'https://www.voynich.nu',
'https://nigerposte.ne',
'https://jiji.ng',
'https://kimcartoon.yt',
*/
/*
requires "continue to site" click or login
'https://www.npr.org',
'https://pfm.gifmis.gov.ng',
*/
'https://www.antiwar.com',
'https://www.wsws.org',
'https://caitlinjohnstone.com',
'https://www.photovoltaik.eu',
'https://www.jungewelt.de',
'https://orientalreview.org',
'https://www.tomdispatch.com',
'https://www.moonofalabama.org',
'https://weltnetz.tv',
'https://off-guardian.org',
'https://fair.org',
'https://www.dasmili.eu',
'https://www.presstv.com',
'https://www.rt.com',
'https://blog.fefe.de',
'https://uebermedien.de',
'https://taibbi.substack.com',
'https://theduran.com',
'https://www.theamericanconservative.com',
'https://thebrexitparty.com',
'https://www.thenews.com.pk',
'https://timesofindia.indiatimes.com',
'https://www.themoscowtimes.com',
'https://sana.sy',
'https://www.hurriyetdailynews.com',
'https://www.kurdistan24.net',
'https://www.bbc.com',
'https://www.aljazeera.com',
'https://edition.cnn.com',
'https://www.msn.com',
'https://www.telesurtv.net',
'https://www.nytimes.com',
'https://www.scmp.com',
'https://www.mexiconews.net',
'https://www.iheartradio.ca',
'https://www.globalresearch.ca',
'https://rpp.pe',
'https://trome.pe',
'https://7news.com.au',
'https://www.tvnz.co.nz',
'https://www.mufg.jp',
'https://www.tsrc.com.tw',
'https://www.javanews.al',
'https://www.legalinfo.mn',
'https://www.newtimes.co.rw',
'https://www.rica.rw',
'https://www.dlb.lk',
'https://svgcc.vc',
'https://macaudailytimes.com.mo',
'https://alytausgidas.lt',
'https://www.duwun.com.mm',
'https://www.rfs.edu.ps',
'https://www.pau.ac.pg',
'https://www.daraz.com.np',
'https://en.emprego.mmo.co.mz',
'https://www.lexpress.mu',
'https://pixelmon.ms',
'https://mailchi.mp',
'https://www.annonces.nc',
'https://www.nouvata.nc',
'https://rabot.nc',
'https://www.biz.nf',
'https://www.phanunpich.webs.nf',
'https://dailypost.ng',
'https://www.legit.ng',
'https://www.reaper.fm',
'https://handzonradio.fm',
'https://gradecalculator.mes.fm',
'https://www.spl.yt',
];
$pre= "<html>\n<head>\n</head>\n<body>\n<table border='1'>\n";
$post= "</table>\n</body>\n</html>\n";
$fmt= "<tr><td>%s</td><td align='center'>%s</td><td>%s</td></tr>\n";
$flag= 0;
foreach ($urls as $u) {
$f= preg_replace('![^a-z0-9]!i', '_', $u);
if (file_exists($f))
continue;
printf("Downloading '%s'... ", $u);
$flag++;
$h= file_get_contents($u);
printf("got %d bytes.\n", strlen($h));
file_put_contents($f, $h);
}
if ($flag)
die(sprintf("Init done, %d URLs. Now point your broser to this script.\n", count($urls)));
echo $pre;
foreach ($urls as $u) {
$f= preg_replace('![^a-z0-9]!i', '_', $u);
if (!file_exists($f))
die("where's da focin file?\n");
$html= file_get_contents($f);
$ue= parse_url($u);
$iurl= parseShortcutIcon($html);
$itxt= '';
if (empty($iurl)) {
$ue['path']= '/favicon.ico';
$itxt= 'GUESS: ';
} else {
$ie = parse_url($iurl);
if (!empty($ie['host']))
$ue['host']= $ie['host'];
if (!empty($ie['scheme']))
$ue['scheme']= $ie['scheme'];
if (!empty($ie['path']))
$ue['path']= preg_replace('!^/+!', '/', '/'.$ie['path']);
}
$iurl= $ue['scheme'] . '://' . $ue['host'] . $ue['path'];
printf($fmt, $u, '<img src="'.$iurl.'" style="max-width:256px; max-height:256px"></img>', $itxt.$iurl);
}
echo $post;
################################################################################
function parseShortcutIcon($html) {
$end= strripos($html, '</head>');
if ($end > 1)
$html= substr($html, 0, $end);
if (preg_match_all('!<link\b[^>]*\brel=("|\')(apple-touch-icon|shortcut icon|icon)\1[^>]*>!iU', $html, $links, PREG_SET_ORDER) < 1)
return null;
$icons= [];
foreach ($links as $link)
if (preg_match('!\shref=("|\')(.+)\1!iU', $link[0], $href))
$icons[]= [
$href[2],
preg_match('!\ssizes=("|\')([0-9\.]+).*\1!i', $link[0], $sizes)
? $sizes[2]
: ( strtolower($link[2])=='apple-touch-icon' ? 64 : 0 ),
];
if (!count($icons))
return null;
usort($icons, function ($a, $b) {
return $a[1]==$b[1]
? 0
: ( $a[1]<$b[1] ? 1 : -1 );
});
return $icons[0][0];
}
Looks like Guzzle has some support for reporting back the redirect-url. Can't find any docu, but what else should RedirectMiddleware\modifyRequest() and Psr7\modify_request() be good for?
Yes, it's easy. In \helpers\WebClient add something like
public static function reqRedir($url, $agentInfo = null) {
$http = self::getHttpClient();
$eurl = $url;
$opt = [
'on_stats' => function (TransferStats $stats) use (&$eurl) {
$eurl = $stats->getEffectiveUri();
},
];
$response = $http->get($url, $opt);
$data = (string) $response->getBody();
if ($response->getStatusCode() !== 200) {
throw new Exception(substr($data, 0, 512));
}
return [ $data, $eurl ];
}
Or expand the current request(). The $agentInfo is ignored anyway, so we can safely abuse the second parameter:
public static function request($url, $opts = null) {
$http = self::getHttpClient();
$agent = null;
$redir = 0;
$eurl = $url;
$opt = [];
if (is_array($opts)) {
if (@$opts['getRedirUrl']) {
$opt['on_stats'] = function (TransferStats $stats) use (&$eurl) { $eurl = $stats->getEffectiveUri(); };
$redir = 1;
}
#TODO: $agent = @$opts['agentInfo'];
}
# else if (!empty($opts))
# $agent = $opts;
$response = $http->get($url, $opt);
$data = (string) $response->getBody();
if ($response->getStatusCode() !== 200) {
throw new Exception(substr($data, 0, 512));
}
return $redir ? [ $data, $eurl ] : $data;
}
I find both ways equally ugly, but what can u do...
SVG shouldn't be too hard to implement: https://stackoverflow.com/questions/10289686/rendering-an-svg-file-to-a-png-or-jpeg-in-php#10289762 But only If ImageMagick is available.
If I recall correctly, the getEffectiveUrl()
is from old version Guzzle.
We handle the redirect history in the SimplePie Guzzle shim:
We probably want to copy it elsewhere still.
Boy do i hate php! Simple testcase:
<?php
parseShortcutIcons(file_get_contents('scmp.com.html'));
function parseShortcutIcons($html) {
printf("%7d bytes html original.\n", strlen($html));
$html= preg_replace('!<script\b[^>]*>.*</script>!siU', '', $html);
printf("%7d bytes after removing scripts.\n", strlen($html));
}
Works perfectly when file 'scmp.com.html' contains testdata like
bla bla bla
. But with the real homepage of the South China Morning Post, it outputs:
2049259 bytes html original. 0 bytes after removing scripts.
Tested with PHP 5.6.30-0+deb8u1 and PHP 7.0.33-0+deb9u3 on two different machines plus a hoster running PHP 5.2.12.
Same thing in perl looks like
use File::Slurp;
parseShortcutIcons(read_file('scmp.com.html'));
sub parseShortcutIcons {
my ($html)= @_;
printf("%7d bytes html original.\n", length($html));
$html=~ s!<script\b[^>]*?>.*?</script>!!sig;
printf("%7d bytes after removing scripts.\n", length($html));
}
It's the same regex, only perl has no "U"-modifier, one needs to append an "?" to make the ".*" ungreedy. And it needs a "g" to apply the regex globally, whereas preg_replace() always works globally. Output:
2049259 bytes html original. 331052 bytes after removing scripts.
Thus, a really functioning equivalent of perl's $html=~ s!<script\b[^>]*?>.*?</script>!!sig;
in php has to look something like
while (false !== $p1= stripos($html, '<script')) {
$block= substr($html, $p1);
$html= substr($html, 0, $p1);
if (false !== $p2= stripos($block, '</script>'))
$html.= substr($block, $p2+strlen('</script>'));
}
Had found it here: https://stackoverflow.com/questions/30682307/how-to-read-the-response-effective-url-in-guzzle-6-0/35443523#35443523. But the method from SimplePieFileGuzzle.php looks better, gonna try that one.
Nope :(
$response->getHeader(\GuzzleHttp\RedirectMiddleware::HISTORY_HEADER)
gives me an empty array. And i guess we have 6.3, as composer.json says
"guzzlehttp/guzzle": "^6.3",
But what i have now in helpers/WebClient.php doesn't look too bad, no?
/**
* get the final URL after a redirect took place (same as original URL if none happend)
*
* @return string URL
*/
public static function getEffectiveUrl() {
return self::$effectiveUrl;
}
/**
* Retrieve content from url
*
* @param string $url
* @param ?string $agentInfo Extra user agent info to use in the request
*
* @throws GuzzleHttp\Exception\RequestException When an error is encountered
* @throws Exception Unless 200 0K response is received
*
* @return string request data
*/
public static function request($url, $agentInfo = null) {
$http = self::getHttpClient();
$eurl = $url;
$opts = [
'on_stats' => function (TransferStats $stats) use (&$eurl) { $eurl = $stats->getEffectiveUri(); },
];
$response = $http->get($url, $opts);
$data = (string) $response->getBody();
self::$effectiveUrl = $eurl;
if ($response->getStatusCode() !== 200) {
throw new Exception(substr($data, 0, 512));
}
return $data;
}
I'm having trouble understanding the php code involved here, as well as what the end-goal is. (just not my kind of language)
I believe I've been troubleshooting Huginn for hours, for an issue that is selfoss, which I believe is related to the issue in this thread.
I can confirm that the current release of selfoss appears to completely ignore the icon I'm setting in my rss feed, and uses the favicon for the domain that I host Huginn.
My current project is a custom rss feed for youtube channels that (among other things) contain the channel thumbnail instead of the youtube favicon.
<atom:icon>https://yt3.ggpht.com/a/AATXAJzGUJdH8PJ5d34Uk6CYZmAMWtam2Cpk6tU_Qw=s512-c-k-c0xffffffff-no-rj-mo</atom:icon>
For testing purposes, I have LXC setup for ttrss, commafeed, and freshrss. (all of which I've tried in the past)
freshrss and ttrss grabbed the icon from link
, and funnily enough (since I run multiple rss things on the same subdomain) commafeed pulled the selfoss icon for some reason.
So realistically, the other readers that are semi-comparable don't seem to use this parameter.
On master, the feed logo is now preferred over favicons. Thanks Themroc for reporting and the pull request.
Though some sites (e.g. https://rss.golem.de/rss.php?tp=av&feed=RSS2.0) use rectangular site logos (https://www.golem.de/staticrl/images/golem-rss.png) so we should probably only use the logo if it is close to square.
Some feeds lack a
<channel> <image>
tag, but do provide a<channel><link>
, where usually a favicon can be found. Would be nice if that'd be used. Now, unfortunately, many pages still use the traditional .ico format, which i guess would cause lots of trouble. So, lets convert it to .png. Herehttps://www.phpclasses.org/package/3906-PHP-Read-and-write-images-from-ICO-files.html
is a class to do that, but it comes with a special license. I'd do the coding, but would you accept such a PR? The license reads: