Expected Behavior

The arstechnica.com recipe is broken and/or they changed their site layout so that the extracted element is often just half of the content.

Recipe Code

Please help provide information about the recipe.

{
    "name": "arstechnica.com",
    "url": "arstechnica.com",
    "stamp": 1470889961,
    "author": "cwmke",
    "match": "arstechnica.com",
    "config": {
        "type": "xpath",
        "xpath": "div[contains(@class, 'article-content')]",
        "multipage": {
            "xpath": "nav[contains(@class, 'page-numbers')]\/span\/a[last()]",                                                                                  
            "append": true,
            "recursive": true
        },  
        "modify": [
            {
                "type": "regex",
                "pattern": "\/<li.*? data-src=\"(.*?)\".*?>\\s*<figure.*?>.*?(?:<figcaption.*?<div class=\"caption\">(.*?)<\\\/div>.*?<\\\/figcaption>)?\\s*<\\\/figure>\\s*<\\\/li>\/s",
                "replace": "<figure><img src=\"$1\"\/><figcaption>$2<\/figcaption><\/figure>"
            }   
        ],  
        "cleanup": [
            "aside",
            "div[contains(@class, 'sidebar')]"
        ]   
    }   
}

Context

Ignore the modify regex, that is not the problem. I've only this example article at hand and this is not supposed to be a political statement or anything (I'm just curious what all this Impostor stuff is actually about)

https://arstechnica.com/gaming/2020/10/aocs-twitch-streaming-debut-attracts-over-435000-among-us-viewers/

Run that article through the filter, and you'll notice that the bottom half of the article is missing.

The article structure is roughly like so:

<article>
  <div> <div> <section class="article-guts> <div class="article-content post-page> </div></div></div>
  <some ad stuff in here>
  <div> <div> <section class="article-guts> rest of article in here </div></div></div>
</article

The filter grabs the first article-content and runs with it. So I changed it to:

    "xpath": [
        "div[contains(@class, 'article-content')]",
        "(//section[@class='article-guts'])[1]"
    ],

Because in Chrome, I can select it in the console using: $x("//section[@class='article-guts']")[1] But in feediron, this results in all content getting dropped (and then the fallback to displaying the full HTML).

I'm confused as to how XPath works and how it works in Feediron and whether it would concatenate 2 expressions or whatever. Just running with the single filter of: "section[@class='article-guts'][last()]" results in, you guessed it, the first article-guts content getting displayed, not the 2nd or last one.

Help? Does feediron extract both XPaths and concatenates them? How can I get it to extract both article-guts classes? Why does it think the forward slashes need to be escaped and re-writes them?

feediron / feediron-recipes

Trouble with XPath expression, how to get the 2nd element? #7

Expected Behavior

Recipe Code

Context