Open ndunn219 opened 2 years ago
From time to time we see an error here, but I'm at a loss as to how to debug.
I'm wondering if superfeedr.com is a reliable dependency here? 🤔
I tried to approve another feed yesterday and it also failed - is that the error you saw? Could you share a stack trace?
Sentry issue: DJANGOPROJECT-J8
SubscriptionError: ('Error during request to hub https://push.superfeedr.com/ for topic https://carrick.eu/feeds/django.atom.xml: We could not verify your callback Error: certificate has expired', <Subscription: Subscription object (1010)>, <Response [422]>)
File "django-admin.py", line 21, in <module>
management.execute_from_command_line()
File "django/core/management/__init__.py", line 419, in execute_from_command_line
utility.execute()
File "django/core/management/__init__.py", line 413, in execute
self.fetch_command(subcommand).run_from_argv(self.argv)
File "django/core/management/base.py", line 354, in run_from_argv
self.execute(*args, **cmd_options)
File "django/core/management/base.py", line 398, in execute
output = self.handle(*args, **options)
File "aggregator/management/commands/update_subscriptions.py", line 33, in handle
subscription.unsubscribe()
File "django_push/subscriber/models.py", line 94, in unsubscribe
return self.send_request(mode='unsubscribe')
File "django_push/subscriber/models.py", line 135, in send_request
raise SubscriptionError(
We could not verify your callback Error: certificate has expired
Now is that them saying our certificate has expired... 🤔
Error is being raised in Django-push here:
Hitting the callback URL (the URL on our site) there doesn't raise the SSL error — our certificate is fine AFAICS. 🤔
(We don't have a lot of feeds. Fetching them each once a day ourselves might be better than a flaky dependency...)
Superfeedr looks like a useful service since it can normalize between the different feed types. But perhaps there is something up with the requests django-push is making, or perhaps our credentials have expired? Can you double check that?
@adamchainz I think this is Superfeedr reporting an error when trying to send a request to our callback URL. (Not us making a request to them.)
AFAICS they’re reporting a certificate issue, where testing the same thing manually none exists.
Yeah that is what it is, I just wanted to be sure. I found their support is on GitHub and that there's this issue open since November covering the same problem. No reply from them, so I guess they aren't maintained.
To be sure, I've emailed the support address, but I would guess it's time to try migrate off indeed. That is not a trivial amount of work though.
Might be good for the DjangoCon sprints if we don't make it before.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
Again, this isn't stale. Rather we need to work out how to replace the super feeder service
(A cron job with feedparser may be enough here 🤔)
Would you be able to help me with the certificate issues?
@julien51 — I can't be sure but I think this is essentially bit-rot in the upstream service. I think all we can do here is implement a task to fetch the feeds ourselves.
Just checking if anyone who’s been looking into this so far can provide an update on where this is at, and what they’d suggest we do? Is there documentation about how this feed aggregation works, so we can hand this over to someone new looking for ways to contribute to Django?
I’ve always found it bizarre how Django’s community blog aggregator only seemed to showcase the work of very few people. I thought it was a strange decision / legacy thing, rather than just software being broken, until talking to @adamchainz a few days ago.
I don't believe anyone has looked at it. This issue is all the progress I know of.
Hey @thibaudcolas — Yes... I've got a folder somewhere with a few feedparser
noodles in it, but nothing more than that I'm afraid. 🤹
This would be an awesome project for someone to pick up — it's a real pain/deficit not having the aggregator working.
@carltongibson 🤹😄 are you ok to share those even if it’s just notes?
From my perspective I see four options to getting this fixed in 2024:
What do you think of those options? Any other ones we should consider? Any specific person you’d think isn’t in this thread and might be able to help?
@thibaudcolas, sure.
Feedparser is the standard Python tool here: https://feedparser.readthedocs.io/en/latest/index.html
Basic usage is fairly straight-forward:
>>> import feedparser
>>> d = feedparser.parse('https://noumenal.es/posts/feed/')
>>> d['feed']['title']
"Carlton's latest posts."
>>> d.entries[0].title
'The longest year'
The relevant models are in the aggregator
app. Feed
and FeedItem
.
https://github.com/django/djangoproject.com/blob/main/aggregator/models.py
The task would be to loop the approved Feed
objects, fetch the data, and create the appropriate FeedItem
objects for each one. I guess:
Feed
. Feedparser supports Etag and Last-Modified, so (maybe 🤔) accepting a --since
flag would be useful. We could then track last-run
(like in a txt file would do — it doesn't have to be clever).
I would then run that with a systemd timer unit every few hours, but there are a billion options, so I'd defer to whatever Ops fancies.
That seems to me like it should be do-able to us. If the running it were an issue — it's a small number of feeds fetched periodically... — I'd add an API endpoint to create the FeedItems
and I'd happily run it myself (as I imagine would many others).
Of the options you mention, 4 stands out to me as an obvious one. Anyone up for it? 😜 I would happily input on a PR (or whatever is needed).
I don't think it's a massive project. It does need someone to clear the decks for it, to give it time. No doubt, a prototype would be simple enough, but then there'll be errors that come up, and need fixing. (I'd suggest adding a "We can't fetch this" error option to Feed.approval_status
...)
That's more or less all I've got. 🎄
Thanks for all the information Carlton. And yes, this would be great for someone to pick up, if Djangonaut Space can help with that it would be awesome.
(like in a txt file would do — it doesn't have to be clever).
I'd suggest a model instead. The change can be committed in a transaction along with the fetched items. And also the tracking data wouldn't be lost if we rebuild the server. This could be a single instance model.
Ok, yes that's a neat idea. 💡 (It's just a fetching optimisation... you still have to check the feed items don't already exist, because feed dates aren't always reliable, and servers don't always support Last-Modified &co, but yes 👍)
Based on the suggestions mentioned above. I would like to work on it. 😊
👀 Let me go through the entire thread and comment out a plan before I push any code for it.
Hey folks 👋🏾
I did went through the thread today, setup the project locally to test out "Add your feed" part and played around with feedparser. Carlton's notes and Adam's blog was really helpful.
The task would be to loop the approved Feed objects, fetch the data, and create the appropriate FeedItem objects for each one. I guess:
A function that does it for one Feed.
I got this part can be done by creating a single instance model, let's call it Subscription
for now that uses feedparser to do get data like title, link, summary etc and creates FeedItem
objects. 👇🏾 Update this with the new instance model and other places that handles subscribe()
and unsubscribe()
I think I have a mental model of the problem till this part.
- Then looping that in a management command.
I am not sure I fully understood what you meant here, having this custom management command so that someone can run that command programmatically ? That way it can periodically fetch updated feeds and also update the feed items with the latest changes ?
This is what I got till now, please correct me if I am going in the wrong direction.
In the mean time let me dig some more and at least have some small code changes on my fork so that I have something tangible to back up my thoughts here. 😄
@Pradhvan sounds great!
The management command... Yes, literally it needs to be run periodically, fetching feeds for updates. (What you described is right, so is there a clarification you need?)
You could imagine doing that by hand, but we'll schedule it using cron or a systemd timer unit or such. Having a command gives the hook to connect that.
Make sense?
The management command... Yes, literally it needs to be run periodically, fetching feeds for updates.
oh okay, got that. A hook to perform action in a scheduled manner.
(What you described is right, so is there a clarification you need?)
yup, just wanted to confirm that I am going in the right direction. 😄
I think I have much clear picture now, let me work the plan discussed on my fork and come back with draft PR/demo or something like that.
Thank @carltongibson for a quick response. 🤗
@Pradhvan are you still looking into this? If you're not, I will try to find time to give it a go in the coming week.
I closed #1299 because feeds appear to be fixed and working again. Noting it here in case this issue is resolved too.
There seems to be an issue adding feeds to the Django Community Blog. My feed (https://www.webucator.com/articles/django/feed/) was approved in October by @adamchainz, but hasn't been added. Adam thought there might be an issue with https://superfeedr.com/, which he said was sending out some obscure certificate errors.