django / djangoproject.com

Source code to djangoproject.com
https://www.djangoproject.com/
BSD 3-Clause "New" or "Revised" License
1.88k stars 947 forks source link

Cannot add feeds to Django Community Blog #1137

Open ndunn219 opened 2 years ago

ndunn219 commented 2 years ago

There seems to be an issue adding feeds to the Django Community Blog. My feed (https://www.webucator.com/articles/django/feed/) was approved in October by @adamchainz, but hasn't been added. Adam thought there might be an issue with https://superfeedr.com/, which he said was sending out some obscure certificate errors.

carltongibson commented 2 years ago

From time to time we see an error here, but I'm at a loss as to how to debug.

I'm wondering if superfeedr.com is a reliable dependency here? 🤔

adamchainz commented 2 years ago

I tried to approve another feed yesterday and it also failed - is that the error you saw? Could you share a stack trace?

sentry-io[bot] commented 2 years ago

Sentry issue: DJANGOPROJECT-J8

carltongibson commented 2 years ago
SubscriptionError: ('Error during request to hub https://push.superfeedr.com/ for topic https://carrick.eu/feeds/django.atom.xml: We could not verify your callback Error: certificate has expired', <Subscription: Subscription object (1010)>, <Response [422]>)
  File "django-admin.py", line 21, in <module>
    management.execute_from_command_line()
  File "django/core/management/__init__.py", line 419, in execute_from_command_line
    utility.execute()
  File "django/core/management/__init__.py", line 413, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "django/core/management/base.py", line 354, in run_from_argv
    self.execute(*args, **cmd_options)
  File "django/core/management/base.py", line 398, in execute
    output = self.handle(*args, **options)
  File "aggregator/management/commands/update_subscriptions.py", line 33, in handle
    subscription.unsubscribe()
  File "django_push/subscriber/models.py", line 94, in unsubscribe
    return self.send_request(mode='unsubscribe')
  File "django_push/subscriber/models.py", line 135, in send_request
    raise SubscriptionError(

We could not verify your callback Error: certificate has expired

Now is that them saying our certificate has expired... 🤔

carltongibson commented 2 years ago

Error is being raised in Django-push here:

https://github.com/brutasse/django-push/blob/ee83ff553aa4bdf2ff969f5443971450cdd25eb5/django_push/subscriber/models.py#L136

Hitting the callback URL (the URL on our site) there doesn't raise the SSL error — our certificate is fine AFAICS. 🤔

carltongibson commented 2 years ago

(We don't have a lot of feeds. Fetching them each once a day ourselves might be better than a flaky dependency...)

adamchainz commented 2 years ago

Superfeedr looks like a useful service since it can normalize between the different feed types. But perhaps there is something up with the requests django-push is making, or perhaps our credentials have expired? Can you double check that?

carltongibson commented 2 years ago

@adamchainz I think this is Superfeedr reporting an error when trying to send a request to our callback URL. (Not us making a request to them.)

AFAICS they’re reporting a certificate issue, where testing the same thing manually none exists.

adamchainz commented 2 years ago

Yeah that is what it is, I just wanted to be sure. I found their support is on GitHub and that there's this issue open since November covering the same problem. No reply from them, so I guess they aren't maintained.

To be sure, I've emailed the support address, but I would guess it's time to try migrate off indeed. That is not a trivial amount of work though.

carltongibson commented 2 years ago

Might be good for the DjangoCon sprints if we don't make it before.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

carltongibson commented 2 years ago

Again, this isn't stale. Rather we need to work out how to replace the super feeder service

(A cron job with feedparser may be enough here 🤔)

julien51 commented 1 year ago

Would you be able to help me with the certificate issues?

carltongibson commented 1 year ago

@julien51 — I can't be sure but I think this is essentially bit-rot in the upstream service. I think all we can do here is implement a task to fetch the feeds ourselves.

thibaudcolas commented 10 months ago

Just checking if anyone who’s been looking into this so far can provide an update on where this is at, and what they’d suggest we do? Is there documentation about how this feed aggregation works, so we can hand this over to someone new looking for ways to contribute to Django?

I’ve always found it bizarre how Django’s community blog aggregator only seemed to showcase the work of very few people. I thought it was a strange decision / legacy thing, rather than just software being broken, until talking to @adamchainz a few days ago.

adamchainz commented 10 months ago

I don't believe anyone has looked at it. This issue is all the progress I know of.

carltongibson commented 10 months ago

Hey @thibaudcolas — Yes... I've got a folder somewhere with a few feedparser noodles in it, but nothing more than that I'm afraid. 🤹

This would be an awesome project for someone to pick up — it's a real pain/deficit not having the aggregator working.

thibaudcolas commented 10 months ago

@carltongibson 🤹😄 are you ok to share those even if it’s just notes?

From my perspective I see four options to getting this fixed in 2024:

  1. Have the proposed website working group take ownership of this and find a volunteer maintainer as part of that group to take it on.
  2. Retire the feed(s) and replace them with something that’s lower maintenance, depending on the findings of the Django website UX project led by @pauloxnet
  3. Propose this as a Google Summer of Code project idea in 2024. Or as part of a wider project of "Django website improvements"
  4. Find someone interested in helping with the website as part of Djangonaut Space or similar efforts to mentor new contributors

What do you think of those options? Any other ones we should consider? Any specific person you’d think isn’t in this thread and might be able to help?

carltongibson commented 10 months ago

@thibaudcolas, sure.

Feedparser is the standard Python tool here: https://feedparser.readthedocs.io/en/latest/index.html

Basic usage is fairly straight-forward:

>>> import feedparser 
>>> d = feedparser.parse('https://noumenal.es/posts/feed/')
>>> d['feed']['title']
"Carlton's latest posts."
>>> d.entries[0].title
'The longest year'

The relevant models are in the aggregator app. Feed and FeedItem.

https://github.com/django/djangoproject.com/blob/main/aggregator/models.py

The task would be to loop the approved Feed objects, fetch the data, and create the appropriate FeedItem objects for each one. I guess:

Feedparser supports Etag and Last-Modified, so (maybe 🤔) accepting a --since flag would be useful. We could then track last-run (like in a txt file would do — it doesn't have to be clever).

I would then run that with a systemd timer unit every few hours, but there are a billion options, so I'd defer to whatever Ops fancies.

That seems to me like it should be do-able to us. If the running it were an issue — it's a small number of feeds fetched periodically... — I'd add an API endpoint to create the FeedItems and I'd happily run it myself (as I imagine would many others).


Of the options you mention, 4 stands out to me as an obvious one. Anyone up for it? 😜 I would happily input on a PR (or whatever is needed).

I don't think it's a massive project. It does need someone to clear the decks for it, to give it time. No doubt, a prototype would be simple enough, but then there'll be errors that come up, and need fixing. (I'd suggest adding a "We can't fetch this" error option to Feed.approval_status...)

That's more or less all I've got. 🎄

adamchainz commented 10 months ago

Thanks for all the information Carlton. And yes, this would be great for someone to pick up, if Djangonaut Space can help with that it would be awesome.

(like in a txt file would do — it doesn't have to be clever).

I'd suggest a model instead. The change can be committed in a transaction along with the fetched items. And also the tracking data wouldn't be lost if we rebuild the server. This could be a single instance model.

carltongibson commented 10 months ago

Ok, yes that's a neat idea. 💡 (It's just a fetching optimisation... you still have to check the feed items don't already exist, because feed dates aren't always reliable, and servers don't always support Last-Modified &co, but yes 👍)

Pradhvan commented 9 months ago

Based on the suggestions mentioned above. I would like to work on it. 😊

👀 Let me go through the entire thread and comment out a plan before I push any code for it.

Pradhvan commented 9 months ago

Hey folks 👋🏾

I did went through the thread today, setup the project locally to test out "Add your feed" part and played around with feedparser. Carlton's notes and Adam's blog was really helpful.

The task would be to loop the approved Feed objects, fetch the data, and create the appropriate FeedItem objects for each one. I guess:

A function that does it for one Feed.

I got this part can be done by creating a single instance model, let's call it Subscription for now that uses feedparser to do get data like title, link, summary etc and creates FeedItem objects. 👇🏾 Update this with the new instance model and other places that handles subscribe() and unsubscribe()

https://github.com/django/djangoproject.com/blob/23d86df716672a2940071897caee8c1cde5b5c75/aggregator/models.py#L64

I think I have a mental model of the problem till this part.

  • Then looping that in a management command.

I am not sure I fully understood what you meant here, having this custom management command so that someone can run that command programmatically ? That way it can periodically fetch updated feeds and also update the feed items with the latest changes ?

This is what I got till now, please correct me if I am going in the wrong direction.

In the mean time let me dig some more and at least have some small code changes on my fork so that I have something tangible to back up my thoughts here. 😄

carltongibson commented 9 months ago

@Pradhvan sounds great!

The management command... Yes, literally it needs to be run periodically, fetching feeds for updates. (What you described is right, so is there a clarification you need?)

You could imagine doing that by hand, but we'll schedule it using cron or a systemd timer unit or such. Having a command gives the hook to connect that.

Make sense?

Pradhvan commented 9 months ago

The management command... Yes, literally it needs to be run periodically, fetching feeds for updates.

oh okay, got that. A hook to perform action in a scheduled manner.

(What you described is right, so is there a clarification you need?)

yup, just wanted to confirm that I am going in the right direction. 😄

I think I have much clear picture now, let me work the plan discussed on my fork and come back with draft PR/demo or something like that.

Thank @carltongibson for a quick response. 🤗

nanuxbe commented 2 weeks ago

@Pradhvan are you still looking into this? If you're not, I will try to find time to give it a go in the coming week.

jefftriplett commented 2 weeks ago

I closed #1299 because feeds appear to be fixed and working again. Noting it here in case this issue is resolved too.