Closed jangernert closed 1 year ago
@jangernert do you have a link to an example article? I could not find one
@HolgerAusB for example this one: https://www.heise.de/news/Neue-Smartphones-Google-stellt-das-Pixel-8-und-das-Pixel-8-Pro-vor-9324936.html
I don' t see any difference in wallabag 2.6.7 or Fulltext-RSS between
strip: //div/section[@data-component='TeaserList']/ancestor::div
or
strip: //div/section[@data-component='TeaserList']/ancestor::div[1]
or even after removing this line completely.
So it would not harm FTR or wallabag to set this '[1]'
@jangernert Could you please prove, if this line is still needed in your use-case?
@HolgerAusB Just tested it: libxml2 still matches all ancestor <div>
elements without the limitation to [1]
for me.
I'm using libxml2 to process the
heise.de
config. And libxml2 seems to match EVERY<div>
that is an ancestor of the matching<section>
which in turn eliminates most of the html document.Does Full-Text RSS do something to prevent this or would the same issue apply?