element-hq / element-integration-manager

Element Integration Manager related issues
6 stars 1 forks source link

Feeds bot: Failed to create connection: Could not read feed from URL #62

Closed MarkWieczorek closed 1 year ago

MarkWieczorek commented 1 year ago

Describe the bug Adding a feed to a room gives the error

Failed to create connection: Could not read feed from URL: Request failed with status code 403

To Reproduce Steps to reproduce the behavior:

  1. Go to a room that already has the new rss feed bot.
  2. Try to add "https://www.lpi.usra.edu/planetary_news/feed/"

room: !GeZUmZzjfKIeFnffoJ:matrix.org

Expected behavior I have migrated about 100 RSS feeds the last week. I have had multiple problems and errors, but this is the first time I ran into this error.

Screenshots If applicable, add screenshots to help explain your problem.

Client (please complete the following information):

Twi1ightSparkle commented 1 year ago

Can confirm

{"errcode":"HS_BAD_VALUE","error":"Feed URL doesn't appear valid"}

Also doesn't work with other Hookshot deployments

Twi1ightSparkle commented 1 year ago

Looks like https://github.com/matrix-org/matrix-hookshot/issues/548 Actually, that might also be an issue, but we're also rejected with a 403 Forbidden (as you also mentioned above)

ERROR 16:15:01:976 [FeedReader] Unable to read feed: Error fetching feed https://lpi.usra.edu/planetary_news/feed: Request failed with status code 403

Also, bypassing the redirect on lpi.usra.edu/planetary_news/feed and adding the target www.lpi.usra.edu/planetary_news/feed doesn't help

ERROR 16:18:36:053 [FeedReader] Unable to read feed: Error fetching feed https://www.lpi.usra.edu/planetary_news/feed: Request failed with status code 403

Half-Shot commented 1 year ago

This is cloudflare being awful and blocking automations/scripts from reading the feed. You can test this at home by doing curl -I 'https://www.lpi.usra.edu/planetary_news/feed/' and seeing the 403 pop up.

Basically, the owner needs to relax the protections on that URL so that cloudflare doesn't stop automations from reading the URL (it's an RSS feed, why would you block reading it). I don't think Hookshot can do much about it aside from trying to forge it's identity as a browser, and I suspect CF are smarter than us at blocking those attempts.

MarkWieczorek commented 1 year ago
  1. On my system, curl -I 'https://www.lpi.usra.edu/planetary_news/feed/' does not return any errors. I get
    HTTP/2 403
    date: Wed, 29 Mar 2023 13:09:20 GMT
    content-type: text/html; charset=UTF-8
    cache-control: max-age=15
    expires: Wed, 29 Mar 2023 13:09:35 GMT
    x-frame-options: SAMEORIGIN
    server: cloudflare
    cf-ray: 7af858c59e4ddd23-LHR
  2. This feed works fine with the Slack RSS bot.