aledeg / xExtension-RedditImage

A FreshRSS extension to process Reddit feeds
GNU Affero General Public License v3.0
18 stars 2 forks source link

Some entries doesn't load `metadata` properly #14

Open Rakambda opened 12 months ago

Rakambda commented 12 months ago

Sometimes, some feed entries doen't display properly and are skipped by the plugin.

Example : https://www.reddit.com/user/neo3dofficial/submitted/.rss?sort=new At some point this error happens (NO METADATA is a log I added in the isValid method) :

[Wed, 13 Sep 2023 18:55:16 +0200] [error] --- NO METADATA
[Wed, 13 Sep 2023 18:55:16 +0200] [error] --- RedditImage\Exception\InvalidContentException:   submitted by   <a href="https://www.reddit.com/user/neo3dofficial"> /u/neo3dofficial </a>   to   <a href="https://www.reddit.com/r/DigitalArt/"> r/DigitalArt </a> <br> <span><a href="https://i.redd.it/d88vk9o3ehgb1.jpg">[link]</a></span>   <span><a href="https://www.reddit.com/r/DigitalArt/comments/15jo38m/chroma_abstract_wallpaper_pack/">[comments]</a></span> in /app/www/extensions/xExtension-RedditImage/Content.php:29
Stack trace:
#0 /app/www/extensions/xExtension-RedditImage/Processor/BeforeInsertProcessor.php(51): RedditImage\Content->__construct()
#1 /app/www/lib/Minz/ExtensionManager.php(338): RedditImage\Processor\BeforeInsertProcessor->process()
#2 /app/www/lib/Minz/ExtensionManager.php(309): Minz_ExtensionManager::callOneToOne()
#3 /app/www/app/Controllers/feedController.php(483): Minz_ExtensionManager::callHook()
#4 /app/www/app/Controllers/feedController.php(653): FreshRSS_feed_Controller::actualizeFeed()
#5 /app/www/lib/Minz/Dispatcher.php(119): FreshRSS_feed_Controller->actualizeAction()
#6 /app/www/lib/Minz/Dispatcher.php(46): Minz_Dispatcher->launchAction()
#7 /app/www/lib/Minz/FrontController.php(58): Minz_Dispatcher->run()
#8 /app/www/p/i/index.php(57): Minz_FrontController->run()
#9 {main}

If I make the metadata regex a bit more tolerent with #(?P<metadata>\s+submitted.*</span>)#, the error is gone. This allows the entry to actually be processed by the transformers and the image inlined. Before, as the error happened, the entry was skipped from going through the processors.

However this doesn't seem to handle all cases. Example (NSFW) : https://www.reddit.com/user/throwmeaway896/submitted/.rss?sort=new With this feed, even if I have the regex modified, some entries failed to match (though image was already added by the BeforeInsertProcessor) :

[Wed, 13 Sep 2023 19:01:09 +0200] [error] --- NO METADATA
[Wed, 13 Sep 2023 19:01:09 +0200] [error] --- RedditImage\Exception\InvalidContentException: <div class="reddit-image figure"><!--xExtension-RedditImage/1.1.1 | RedditImage\Processor\BeforeInsertProcessor | RedditImage\Transformer\Agnostic\ImageTransformer--><img src="https://i.redd.it/rtuzzy7h9nnb1.jpg" class="reddit-image"></div>
  submitted by   <a href="https://www.reddit.com/user/throwmeaway896"> /u/throwmeaway896 </a>   to   <a href="https://www.reddit.com/r/phgonewild/"> r/phgonewild </a> <br> <span><a href="https://i.redd.it/rtuzzy7h9nnb1.jpg">[link]</a></span>   <span><a href="https://www.reddit.com/r/phgonewild/comments/16fy4tb/what_if_nasa_kama_mo_ako_now/">[comments]</a></span> in /app/www/extensions/xExtension-RedditImage/Content.php:29
Stack trace:
#0 /app/www/extensions/xExtension-RedditImage/Processor/BeforeDisplayProcessor.php(43): RedditImage\Content->__construct()
#1 /app/www/lib/Minz/ExtensionManager.php(338): RedditImage\Processor\BeforeDisplayProcessor->process()
#2 /app/www/lib/Minz/ExtensionManager.php(309): Minz_ExtensionManager::callOneToOne()
#3 /app/www/app/views/index/normal.phtml(34): Minz_ExtensionManager::callHook()
#4 /app/www/lib/Minz/View.php(88): include('...')
#5 /app/www/lib/Minz/View.php(110): Minz_View->includeFile()
#6 /app/www/app/layout/layout.phtml(69): Minz_View->render()
#7 /app/www/lib/Minz/View.php(88): include('...')
#8 /app/www/lib/Minz/View.php(101): Minz_View->includeFile()
#9 /app/www/lib/Minz/View.php(68): Minz_View->buildLayout()
#10 /app/www/lib/Minz/Dispatcher.php(56): Minz_View->build()
#11 /app/www/lib/Minz/FrontController.php(58): Minz_Dispatcher->run()
#12 /app/www/p/i/index.php(57): Minz_FrontController->run()
#13 {main}

I have to say I don't really understand that one, using an online checker the regex seems to match https://www.phpliveregex.com/p/JSm

aledeg commented 11 months ago

Thank you for the report. I'll look into that.

aledeg commented 11 months ago

I was trying to download the XML file from the RSS feed. But for some reason my wget command does not work anymore. Do you happen to have either a working wget/curl command or the files? The former is preferred since I could reproduce that. Thank you

couchoud-t commented 11 months ago

Here you go, I just renamed them as .log otherwise Github doesn't accept the .xml

neo3dofficial.log throwmeaway896.log - NSFW

aledeg commented 11 months ago

Thank you. Do you have a way of downloading them? I'll be interested since I am hitting 403 errors when using wget.

couchoud-t commented 11 months ago

Just opened them in the browser and saved the page 😄

aledeg commented 11 months ago

You're lucky. When I am doing that it enters in an non-ending loop of downloading atom files. I need to figure out what this is about.