Closed einkoro closed 3 years ago
Also worth noting this might be a problem for sites that modify the url on load like a lot of medium sites with cnames that are setting cookies and appending parameters. A good example is or at least was: https://blog.hunter.io/
Also worth noting many sites publish feeds but do not bother to add the auto discovery markup in the head anymore such as Apple, the BBC and CNN as common examples. This could be done as a known feeds map separately.
This would probably be best as a CloudFlare worker API that’s called when no feeds are discovered. Or alternatively every page and compared with discovered feeds?
Examples: https://www.apple.com/ca/rss/ https://www.bbc.co.uk/news/10628494 https://www.cnn.com/services/rss/
Relevant services, APIs and projects: https://feedsearch.dev/ https://developer.feedly.com/v3/search/ https://github.com/DBeath/feedsearch-crawler https://github.com/ggkovacs/rss-finder
It might even be a better idea to spin off a new extension / web service for this than bolt it over the existing plugin due to the cost to run such a service. A yearly or monthly subscription model would make more sense for such an extension.
How would we address private feeds such as GH?
Crawl links with text or href values containing rss, atom, rdf, json, or feed.
If parsing pages for feeds was moved from the injected client side script to a server side API we wouldn’t need the entitlement for full access which is apparently scary judging by AppStore reviews complaining about security risks.
This could likely be done on AWS Lambda free tier. Additionally CloudFlare workers could be used to cache at the edge and reduce costs or the entire thing could be handled at the edge by CloudFlare workers. Much better latency (no cold starts) and much more predictable costs with only CloudFlare workers.
Pros:
Cons:
Machine learning might be viable to sniff out feeds linked on the page that don’t have alternates or for providing feeds from page content when no feeds are available.