chris1610 / pbpython

Code, Notebooks and Examples from Practical Business Python
https://pbpython.com
BSD 3-Clause "New" or "Revised" License
1.99k stars 987 forks source link

The very first post from the RSS feed is not available on the website #29

Open retifrav opened 3 years ago

retifrav commented 3 years ago

This issue is about the website, not the Python code. I didn't find a better way to report it, so I've put it here.

In the RSS feed the very first (the oldest) entry leads to Introduction to the site page, which resolves to 403 Access Denied.

So either something went wrong and this page now has a different URL on the website, or you removed it intentionally but forgot to also remove it from the RSS feed. A minor issue really, just wanted to let you know.

Also it might be a good idea not to put the entire articles content to RSS feed (instead it could be just the couple of paragraphs). Although, if you intended it like this, that is fine of course.

chris1610 commented 3 years ago

Thanks for reporting the issue. I do see that that url is wrong so there is an issue.

The one question I have is how you are seeing it in the RSS feed? When I look at the feed, the oldest item I see is from June 2020. I purposely changed a config option a while back to only show 10 articles in the RSS feed so I'm not sure how you are seeing this article. When you click the link in this issue, do you see all articles or just the previous 10?

retifrav commented 3 years ago

Yeah, I'm a bit puzzled by this too. If I get that feed with cURL, Wget or just open it in web-browser, then the oldest entry I get is the one from 2020-06-02. But my RSS client (NetNewsWire) and Feedly somehow manage to get all the entries, including the very first one from 2014-09-18.

Here's a screenshot:

feed

So I'm quite curious about how exactly this happens. Could it be that you have some other feed, which RSS clients discover and use instead? Although I did try exporting the feed URL from my client and it is the same, although it says that it's http://, whether your server seems to be redirecting to https:// right away.

retifrav commented 3 years ago

Ooh, I have a guess. First I added your feed via Feedly, and looks like the way Feedly works is that it "caches" the feeds which users add to it, and so it serves not the actual feeds but their "cached" variants, so that's how I got all the entries from your website (as apparently they were collected by Feedly from the beginning of times).

Now I tried adding your feed directly to my client, without Feedly, and that way I got only the last 10 entries.

I am still not sure if it is actually so, and I didn't know that Feedly has(?) such functionality, but that's the only explanation I can come up with.

chris1610 commented 3 years ago

Ah. That makes sense and is helpful. Thanks for taking time to dive into this in more detail.

Ideally I should just put a redirect on my site to point to the right place but that does not seem to be working correctly. However, I'm hosting on S3 and I can't seem to get the redirect to work correctly. I will need to play with it some more to fix it.