RSS-Bridge / rss-bridge

The RSS feed for websites missing it
https://rss-bridge.org/bridge01/
The Unlicense
6.99k stars 1.02k forks source link

XPathBridge - how to select multiple categories? #3999

Closed drelephant closed 3 months ago

drelephant commented 3 months ago

Disclaimer, I first learned about XPaths today, so this may be a stupid question...

I'm trying to parse categories from a page that has the following html:

<span itemprop="applicationCategory">
<a href="https://sitename/software/">Software</a> » 
<a href="https://sitename/software/mac/">Mac OSX</a>
</span>

, and for the item category selector, I put in .//span[@itemprop="applicationCategory"]//a.

I believe that should select both entries, but it only returns the first one?

image

If I put in simply, .//span[@itemprop="applicationCategory"], then it shows all the text, but there's only one entry in the list, "Software » Mac OSX", not two entries.

image

How can I get it to parse and add both categories separately?

edit: I notice that if the categories have a comma between them, it does create multiple categories. It only doesn't recognize that weird "»" symbol as a separator.

dvikan commented 3 months ago

@Niehztog

Niehztog commented 3 months ago

Hello @drelephant ,

unfortunately multple categories are not yet supported by XPathBridge. Whenever your XPath expression returns multiple elements, only the first element is used and all others are omitted. Support for multiple elements/categories would be a new feature request.

You mentioned that a comma between the categories does in fact create multiple categories. I haven't tested it, but you can try to replace the other weird symbol by a comma with an expression like:

concat(.//span[@itemprop="applicationCategory"]//a[1],",",.//span[@itemprop="applicationCategory"]//a[2])

I made a test script here. This will output the first two categories as one element separated by a comma. Let me know if that helps you. Otherwise we would have to make changes to XPathBridge.

dvikan commented 3 months ago

thank you @Niehztog for taking time to help users and maintain this bridge