fivefilters / ftr-site-config

Site-specific article extraction rules to aid content extractors, feed readers, and 'read later' applications.
https://www.fivefilters.org/full-text-rss/
Other
363 stars 254 forks source link

I can't set author for feeds from RSS-Bridge #982

Open HolgerAusB opened 2 years ago

HolgerAusB commented 2 years ago

I host RSS-Bridge and FTR locally. I use the FeedMergeBridge.php to merge 5 sub-feeds of the same site to a single one, so FreshRSS can sort out duplicates. Between them is FullTextRSS (FTR) which takes the full articles. This works great!

But FTR sets the author field dc:creator to literal 'RSS-Bridge' and I can't find out how to get rid of that. I tried to find out how to do that on RSS-Bidge side, but that is too heavy for me.

So I made configs in FTR both for my ip-address of RSS-Bridge and for the original site, where the full-text-content is from.

While the debug mode shows, that FTR found a correct match for the author, this is not passed to the outgoing feed. That match comes from the original site config. I added to existing diepresse.com.txt:

author: //span[@class='author__name']
author: '-'

Some articles have no author, so the second line would catch that. Will this change to empty string when I get arround the problem. I uploaded both, the feed from bridge and the feed from FTR.

So how to override this feed-author field from RSS-Bridge?

The examples:

Bridge and FTR are not exposed to the internet, so no link, sorry

fivefilters commented 2 years ago

We should really document this aspect of Full-Text RSS. Currently the author name extracted from the article is only used when the input feed doesn’t contain author information. If the input feed has author info, it’s always prioritised. If you have control over the input feed and can remove the author info from it, then the extracted author should be used in the feed Full-Text RSS produces.

We’ll introduce a request parameter in a future version to override this behaviour, as we have with ‘use_extracted_title’, which you can use to tell Full-Text RSS that the extracted article title should replace any title in the source feed.

HolgerAusB commented 2 years ago

@fivefilters unfortunately I can't control the author field in the input feed (output from rss-bridge). See my question on their side

Just to be sure, I need to emphasize that it is the feed-author of rss-bridge not item-authors

HolgerAusB commented 5 months ago

My workaround is to use a cronjob which

get feeds via Fivefilter's Fulltext-RSS:

curl -s --connect-timeout 180 -m 180 -o derstandard-newsroom.rss "https://local.example.com/ftr/makefulltextfeed.php?url=https%3A%2F%2Fwww.derstandard.at%2Frss&max=6&links=preserve&exc=1" curl -s --connect-timeout 180 -m 180 -o derstandard-inland.rss "https://local.example.com/ftr/makefulltextfeed.php?url=https%3A%2F%2Fwww.derstandard.at%2Frss%2Finland&max=10&links=preserve&exc=1" curl -s --connect-timeout 180 -m 180 -o derstandard-web.rss "https://local.example.com/ftr/makefulltextfeed.php?url=https%3A%2F%2Fwww.derstandard.at%2Frss%2Fweb&max=3&links=preserve&exc=1" curl -s --connect-timeout 180 -m 180 -o derstandard-kultur.rss "https://local.example.com/ftr/makefulltextfeed.php?url=https%3A%2F%2Fwww.derstandard.at%2Frss%2Fkultur&max=5&links=preserve&exc=1" curl -s --connect-timeout 180 -m 180 -o derstandard-gesundheit.rss "https://local.example.com/ftr/makefulltextfeed.php?url=https%3A%2F%2Fwww.derstandard.at%2Frss%2Fgesundheit&max=2&links=preserve&exc=1" curl -s --connect-timeout 180 -m 180 -o derstandard-lifestyle.rss "https://local.example.com/ftr/makefulltextfeed.php?url=https%3A%2F%2Fwww.derstandard.at%2Frss%2Flifestyle&max=2&links=preserve&exc=1"

combine feed via a preconfigured RSS-Bridge, the bride configuration contains the part-feed-addresses from above.

curl -s --connect-timeout 180 -m 180 -o derstandard-ftr.rss "http://ip-address:3000/?action=display&bridge=StandardAT&format=Atom" sed -i 's/author>/origin>/g' derstandard-ftr.rss



Of course, it would be more efficient, having that override parameter