feediron / ttrss_plugin-feediron

Evolution of ttrss_plugin-af_feedmod
https://discourse.tt-rss.org/t/plugin-update-feediron-v1-2-0/2018
MIT License
204 stars 34 forks source link

Fix Recursive fetch after new reformat option #202

Closed monofox closed 4 months ago

monofox commented 4 months ago

After integration of feediron/ttrss_plugin-feediron#199 for having feature to reformat found subsequent article links, recursive multipage handling mode is failing.

This commit fixes the recursive loop.

Fixes feediron/ttrss_plugin-feediron#201

Please answer the following questions for yourself before submitting a pull request. YOU MAY DELETE UNUSED SECTIONS.

NOTICE!!!

All rule submissions should be done in the https://github.com/feediron/feediron-recipes repository.

Bugfix/Enhancement

Tests executed

Test 1

Configuration:

{
        "type": "xpath",
        "xpath": "div[contains(@class, 'article-content')]",
        "multipage": {
            "xpath": "nav[contains(@class, 'page-numbers')]\/span\/a[last()]",
            "append": true,
            "recursive": true
        },
        "modify": [
            {
                "type": "regex",
                "pattern": "\/<li.*? data-src=\"(.*?)\".*?>\\s*<figure.*?>.*?(?:<figcaption.*?<div class=\"caption\">(.*?)<\\\/div>.*?<\\\/figcaption>)?\\s*<\\\/figure>\\s*<\\\/li>\/s",
                "replace": "<figure><img src=\"$1\"\/><figcaption>$2<\/figcaption><\/figure>"
            }
        ],
        "cleanup": [
            "aside",
            "div[contains(@class, 'sidebar')]"
        ]
    }

Testurl: https://arstechnica.com/gadgets/2024/05/all-the-ways-streaming-services-are-aggravating-their-subscribers-this-week/

Purpose: ensure, that recursive multipage handling is working with disabled reformat.

Test 2

Configuration:

{
    "type": "xpath",
    "xpath": "article",
    "tags": {
        "type": "xpath",
        "xpath": "meta[@name='keywords']",
        "split": ",",
        "modify": [
            {
                "type": "replace",
                "search": "\"\/>",
                "replace": ""
            }
        ]
    },
    "cleanup": [
        "amp-analytics",
        "amp-consent",
        "amp-pixel",
        "amp-ad",
        "header",
        "amp-font",
        "a[@class='link-to-top']",
        "div[contains(@class ,'amp-ad-container')]",
        "div[contains(@class ,'social-sticky')]",
        "footer",
        "aside[@id='job-market']",
        "aside[@class='aside__meta']",
        "ul[contains(@class, 'social-tools')]",
        "ol[@class='list-pages']",
        "div[@amp-access='NOT subscriber' and text() = 'Anzeige']"
    ],
    "multipage": {
        "xpath": "ol[@class='list-pages' and not(@id='atoc_line')]\/li\/a[text() != '\u203a']",
        "append": true,
        "reformat": true
    },
    "reformat": [
        {
            "type": "regex",
            "pattern": "\/\\.html$\/",
            "replace": ".amp.html"
        }
    ]
}

Testurl: https://www.golem.de/news/sony-ult-wear-im-vergleichstest-ein-erschwinglicher-kopfhoerer-der-begeistert-2405-184690.html

Purpose: Ensure, that reformat works in a non-recursive mode (all links are found and reformatted).

Test 2

Configuration:

{
    "type": "xpath",
    "xpath": "article",
    "tags": {
        "type": "xpath",
        "xpath": "meta[@name='keywords']",
        "split": ",",
        "modify": [
            {
                "type": "replace",
                "search": "\"\/>",
                "replace": ""
            }
        ]
    },
    "cleanup": [
        "amp-analytics",
        "amp-consent",
        "amp-pixel",
        "amp-ad",
        "header",
        "amp-font",
        "a[@class='link-to-top']",
        "div[contains(@class ,'amp-ad-container')]",
        "div[contains(@class ,'social-sticky')]",
        "footer",
        "aside[@id='job-market']",
        "aside[@class='aside__meta']",
        "ul[contains(@class, 'social-tools')]",
        "ol[@class='list-pages']",
        "div[@amp-access='NOT subscriber' and text() = 'Anzeige']"
    ],
    "multipage": {
        "xpath": "ol[@class='list-pages' and not(@id='atoc_line')]\/li\/a[text() != '\u203a']",
        "append": true,
        "recursive": true,
        "reformat": true
    },
    "reformat": [
        {
            "type": "regex",
            "pattern": "\/\\.html$\/",
            "replace": ".amp.html"
        }
    ]
}

Testurl: https://www.golem.de/news/sony-ult-wear-im-vergleichstest-ein-erschwinglicher-kopfhoerer-der-begeistert-2405-184690.html

Purpose: Ensure, that reformat works in a recursive mode (while all links are found again on same page).

Fixes #201

Proposed Changes

-

-

dugite-code commented 4 months ago

Awesome, thanks for your work!