cn-tools / cntools_FreshRssExtensions

This repository contains my unofficial FreshRSS extensions
MIT License
49 stars 11 forks source link

Documentation: FreshRSS FilterTitle: How is this different from in-built mark article as read in freshrss? #13

Closed lamyergeier closed 6 months ago

lamyergeier commented 8 months ago

Freshrss has native feature to mark article as read based on the keywords: [Filtering articles](https://freshrss.github.io/FreshRSS/en/users/10_filter.html)

How is this different than that? Does it support regex?

github-actions[bot] commented 8 months ago

Welcome lamyergeier :tada:

Congrats to your first issue!

cn-tools commented 8 months ago

With this extension the new feed entry will NOT be added into your FreshRSS database / instance. The equivalent entry is fully blocked.

At the moment it's string compare only. Maybe comes regex compare in the future.

lamyergeier commented 8 months ago

Regex support can solve this issue:

[Feature Hide/delete articles with non latin script · Issue #6144 · FreshRSS/FreshRSS](https://github.com/FreshRSS/FreshRSS/issues/6144)

cn-tools commented 8 months ago

maybe future comes earlier as i thinked a few hours ago 😆

cn-tools commented 8 months ago

@lamyergeier feel free to try the extension FilterTitle with now implemented regex

lamyergeier commented 8 months ago

Can I do as follows (do I need to use / (as we do with sed)?)

/[sS]ponsor/
/[aA]dvertisement/
/[sS]horts?/

Is it possible to specify case insensitvity like

/sponsor/i
/advertisement/i
/shorts?/i

What about pattern with space

/North Korea/

Could you suggest regex to ignore non-latin scripts (example, Chinese, Japanese, Korean, Arabic, Thai, Hindi, Tamil, Kannada, Telugu languages)?


I am not sure about the syntax.

cn-tools commented 8 months ago

i reworked the plug in with #17 , so update your installation please.

you can now define how the check result should be used. use as block or release.

in your case i think you have to use release and regex /\p{Latin}/i

lamyergeier commented 8 months ago

Could you please say what is meant by block, release and exam type?

For my above examples I entered the following in the extension options (not sure what to choose for exam type):

/[sS]ponsor/i
/[aA]dvertisement/i
/[sS]horts?/i
/North Korea/
/\p{Latin}/i
cn-tools commented 8 months ago

I think I should define a clearer text 🤪

The exam type determines how the result of the check is used. This allows you to specify that the keywords are applied as a blacklist or whitelist to the title of the feed entry.

In your issue 6144 you say that you wants feed entries with Latin chars in title only. And in this case i would define release and as keyword /\p{Latin}/i

lamyergeier commented 8 months ago

May be we should have separate black and white lists?

cn-tools commented 8 months ago

sounds good

i have provided an update to version v0.0.3 of xExtension-FilterTitle

lamyergeier commented 8 months ago

So if I add /\p{Latin}/i in whitelist, then feeds with non latin charaters will get automatically deleted?

Also may I request if its possible to add an option to either delete or mark as read based on the filter.

cn-tools commented 8 months ago

Yes, if you add this expression to the whitelist, the new feed entry will not be added to the database

I will try to provide an option to set the new feed entry to be added to the database as read

lamyergeier commented 8 months ago

I will try to provide an option to set the new feed entry to be added to the database as read

may be its useful to give this option separately for whitelist and blacklist for more granular control. that is in total 2 times, once each for the entire whitelist and the entire blacklist

lamyergeier commented 8 months ago

May be its also useful to tag the filtered and read feeds as FilterTitle to indicate that those feeds were marked read automatically by the extension.

cn-tools commented 8 months ago

@lamyergeier checkbox for "mark as read" is available in the actual version i'm waiting for your response

lamyergeier commented 8 months ago

@cn-tools I updated the extension, enabled this setting, would confirm that it works , if I see non latin feeds marked read.

lamyergeier commented 8 months ago

@cn-tools Issue: checkbox selection does not persist in GUI!

cn-tools commented 8 months ago

@lamyergeier ah sh*t - sorry

i used a wrong save name for data of the checkbox.

update once again please. you must set the checkbox new and save it.

waiting for your answer

lamyergeier commented 8 months ago

Title with punctuation marks getting ignored!

Solution: include every supported property code (PHP: Unicode character properties - Manual)

example:

image

cn-tools commented 8 months ago

could you send me your regex please, because if you allow i would like to add it to examples

lamyergeier commented 8 months ago

Can you check if the whitelist is working? I did the following:

<?php
$String="தமிழ் அரிச்சுவடி";
if (preg_match("/\p{Latin}/i", $String)) {
    echo "A match was found in $String.\n";
} else {
    echo "A match was not found in $String.\n";
}

returns,

A match was found in தமிழ் அரிச்சுவடி.

May be we could write unit test for regex. I don't know PHP or else I would have contributed myself. Above is my first PHP code ever.

cn-tools commented 7 months ago

here is a regex provided as you are searching for: https://stackoverflow.com/a/70533736