fivefilters / ftr-site-config

Site-specific article extraction rules to aid content extractors, feed readers, and 'read later' applications.
https://www.fivefilters.org/full-text-rss/
Other
365 stars 255 forks source link

Add thenewdaily.com.au #1317

Closed shtrom closed 8 months ago

HolgerAusB commented 8 months ago

hhm,

# get rid of the more links, very brittle
strip: //main/div[5]

That doesn't look like a reliable way to strip something. I will have a deeper look later.

HolgerAusB commented 8 months ago

..and it is wallabagger only. I don't get content with wallabag UI or FTR due to cloudflare's bot detection

HolgerAusB commented 8 months ago

@shtrom what do you think about my changes? https://github.com/HolgerAusB/ftr-site-config/blob/thenewdaily.com.au/thenewdaily.com.au.txt

shtrom commented 8 months ago

On 27 January 2024 5:05:12 am AEDT, Holger @.***> wrote:

@shtrom what do you think about my changes? https://github.com/HolgerAusB/ftr-site-config/blob/thenewdaily.com.au/thenewdaily.com.au.txt

No issue with them. I went for the fastest way to make it work.

I had no luck with Wallabagger, though. This worked for me only with server-side fetching. So I'm not sure that the comment at the top is correct. -- Olivier Mehani @.***> Sent from my mobile, please excuse my brevity.

shtrom commented 8 months ago

On 27 January 2024 2:24:14 am AEDT, Holger @.***> wrote:

..and it is wallabagger only. I don't get content with wallabag UI or FTR due to cloudflare's bot detection

Ah, right. No, this worked for me with the user-agent switch. However, my Wallabag is coming from the same IPv4 as I do, so this may have helped.

Does Wallabagger use site-configs, too?

-- Olivier Mehani @.***> Sent from my mobile, please excuse my brevity.

HolgerAusB commented 8 months ago

That is very confusing. You are right, @shtrom. I get content with wallabag UI on app.wallabag.it and the hosted FTR at fivefilters.net. But I don't get content on my self-hosted versions :-( Even after restarting my router to get a new external IPv4.

My FTR/wallabag do have the same IPv4 as my desktop browser, so maybe that ip-range or my country is lowering my cloudflare score below that point where the content gets through. I even disarmed my PiHole for this test, without success.

Does Wallabagger use site-configs, too?

Wallabagger doesn't need any site-depended config. It is an extension for desktop browsers. And when activated in its settings, wallabagger is pushing the whole website-html, which is already interpreted by Firefox/Chrome/etc, to wallabag and there, wallabag is using its site-configs to extract the necessary parts.

Edit: Thank you for testing and reporting. My changes are now committed.

shtrom commented 8 months ago

Yep. I'm using Wallabager as a last resort when I can't work out a functional site-config, or when there's a JS loader of some sort.

But I'm curious about how it works, if it doesn't need site-configs.

I guess I'll have to go read the source heh. -- Olivier Mehani @.***> Sent from my mobile, please excuse my brevity.

shtrom commented 8 months ago

Oh, nevermind, you explained the last bit in the end. -- Olivier Mehani @.***> Sent from my mobile, please excuse my brevity.