RSS-Bridge / rss-bridge

The RSS feed for websites missing it
https://rss-bridge.org/bridge01/
The Unlicense
7.23k stars 1.03k forks source link

FeedFilter does not work with regex ^ (starting with) character #3985

Closed dangnhdev closed 7 months ago

dangnhdev commented 7 months ago

Describe the bug When I attempt to use a regular expression that use the ^ (start with) character for filtering feeds, the filter does not work. All entries are returned.

To Reproduce Steps to reproduce the behavior: Example feed: https://github.com/RSS-Bridge/rss-bridge/commits/master.atom Try to use this filter: ^fix Example Feed: https://rss-bridge.org/bridge01/?action=display&bridge=FilterBridge&url=https%3A%2F%2Fgithub.com%2FRSS-Bridge%2Frss-bridge%2Fcommits%2Fmaster.atom&filter=%5Efix&filter_type=block&target_title=on&length_limit=-1&format=Html

Rss Bridge version: 2024-02-02

dvikan commented 7 months ago

im unable to reproduce. perhaps give url to example

dangnhdev commented 7 months ago

Here is the URL which demonstrate the problem. The input feed is the github commit feed of this repo. Filter is ^fix https://rss-bridge.org/bridge01/?action=display&bridge=FilterBridge&url=https%3A%2F%2Fgithub.com%2FRSS-Bridge%2Frss-bridge%2Fcommits%2Fmaster.atom&filter=%5Efix&filter_type=block&target_title=on&length_limit=-1&format=Html

If I use fix (without ^ character), it work well: https://rss-bridge.org/bridge01/?action=display&bridge=FilterBridge&url=https%3A%2F%2Fgithub.com%2FRSS-Bridge%2Frss-bridge%2Fcommits%2Fmaster.atom&filter=fix&filter_type=block&target_title=on&length_limit=-1&format=Html

dvikan commented 7 months ago

there is whitespace before the commit message.

this explain whys ^fix fails.

maybe use ^\s+fix as a workaround? (not tested. might fail due to multi-line)

unclear to me if this whitespacing is coming from github or rssbridge.

i guess we could trim the feed values in FilterBridge also, which would fix the current issue.

dvikan commented 7 months ago

here is directly from https://github.com/RSS-Bridge/rss-bridge/commits/master.atom:

  <entry>
    <id>tag:github.com,2008:Grit::Commit/598ee5b51eaba62dc672f9e9f6f96ac628e56263</id>
    <link type="text/html" rel="alternate" href="https://github.com/RSS-Bridge/rss-bridge/commit/598ee5b51eaba62dc672f9e9f6f96ac628e56263"/>
    <title>
        fix(pinterest): set enclosure so it emits mrss media:content prop (#3…
    </title>
    <updated>2024-02-14T15:02:54Z</updated>
    <media:thumbnail height="30" width="30" url="https://avatars.githubusercontent.com/u/546570?s=30&amp;v=4"/>
    <author>
      <name>dvikan</name>
      <uri>https://github.com/dvikan</uri>
    </author>
    <content type="html">
      &lt;pre style=&#39;white-space:pre-wrap;width:81ex&#39;&gt;fix(pinterest): set enclosure so it emits mrss media:content prop (#3980)&lt;/pre&gt;
    </content>
  </entry>

the <title> contains a newline and some spaces.

dangnhdev commented 7 months ago

unclear to me if this whitespacing is coming from github or rssbridge.

I'm sure this issue originates from GitHub. I verified this with another service and encountered the same error, where the title has spacing at the beginning.

i guess we could trim the feed values in FilterBridge also, which would fix the current issue.

I suggest that we trim the title for clarity, since it's hard to notice spaces when viewing XML format.

In the mean time, I'm using a somewhat crude workaround: I pass the original feed through the FilterBridge without any filter, then pass that RSS-Bridge link (which now correctly trims the title) to FilterBridge again and set the regex. 🤯

dvikan commented 7 months ago

thanks for reporting this issue. i created a fix.