fossar / selfoss

multipurpose rss reader, live stream, mashup, aggregation web application
https://selfoss.aditu.de
GNU General Public License v3.0
2.38k stars 345 forks source link

Dolphin feed fails with Argument must not be empty in SimplePie #1237

Closed jtojnar closed 3 years ago

jtojnar commented 3 years ago

The following extracted code

<?php

require __DIR__ . '/src/common.php';

use Monolog\Handler\ErrorLogHandler;

// change logs to stderr
$handler = new ErrorLogHandler(ErrorLogHandler::OPERATING_SYSTEM, 'debug');
$handler->setFormatter($formatter);
$log->popHandler();
$log->pushHandler($handler);

$webClient = $dice->create(helpers\WebClient::class);
$simplepie = $dice->create(SimplePie::class);
$simplepie->set_curl_options([
    helpers\WebClient::class => $webClient
]);

$simplepie->set_file_class(helpers\SimplePieFileGuzzle::class);
$simplepie->set_autodiscovery_level(SIMPLEPIE_LOCATOR_AUTODISCOVERY | SIMPLEPIE_LOCATOR_LOCAL_EXTENSION | SIMPLEPIE_LOCATOR_LOCAL_BODY);
$simplepie->set_useragent($webClient->getUserAgent());

// load

@$simplepie->set_feed_url('https://dolphin-emu.org/blog/feeds/');
// fetch items
@$simplepie->init();

// on error retry with force_feed
if (@$simplepie->error()) {
    @$simplepie->set_autodiscovery_level(SIMPLEPIE_LOCATOR_NONE);
    @$simplepie->force_feed(true);
    @$simplepie->init();
}

// check for error
if (@$simplepie->error()) {
    throw new \Exception($simplepie->error());
}

var_dump([
    // save fetched items
    'items' => $simplepie->get_items(),
    'htmlUrl' => htmlspecialchars_decode($simplepie->get_link(), ENT_COMPAT), // SimplePie sanitizes URLs
    'spoutTitle' => $simplepie->get_title(),
]);

fails with

DOMDocument::loadHTML(): Argument #1 ($source) must not be empty {"exception":"[object] (ValueError(code: 0): DOMDocument::loadHTML(): Argument #1 ($source) must not be empty at /var/www/selfoss/vendor/simplepie/simplepie/library/SimplePie/Locator.php:83)
[stacktrace]
#0 /var/www/selfoss/vendor/simplepie/simplepie/library/SimplePie/Locator.php(83): DOMDocument->loadHTML('')
#1 [internal function]: SimplePie_Locator->__construct(Object(helpers\\SimplePieFileGuzzle), 10, 'Selfoss/2.19-SN...', 10, false, Array)
#2 /var/www/selfoss/vendor/simplepie/simplepie/library/SimplePie/Registry.php(183): ReflectionClass->newInstanceArgs(Array)
#3 /var/www/selfoss/vendor/simplepie/simplepie/library/SimplePie.php(1644): SimplePie_Registry->create('Locator', Array)
#4 /var/www/selfoss/vendor/simplepie/simplepie/library/SimplePie.php(1381): SimplePie->fetch_data(Object(SimplePie_Cache_File))
#5 /var/www/selfoss/workbench.php(28): SimplePie->init()
#6 {main}
jtojnar commented 3 years ago

Looks like their server return 302 Found correctly when accessed by “browser”

curl -v 'https://dolphin-emu.org/blog/feeds/' -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:83.0) Gecko/20100101 Firefox/83.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8' -H 'Accept-Language: cs,en-US;q=0.7,en;q=0.3'

but 200 OK with empty body when Accept-Language header is omitted.

Either way, it should not crash ContentLoader.

jtojnar commented 3 years ago

Fixed the contentLoader sandboxing in 73c60f13e9f25ab21def999569a6316e3ce2afa5.