Closed tomislav closed 5 years ago
I quickly checked the site which unfortunately returns no contents if javascript is disabled. That makes it unusable for RSS-Bridge. However, they do provide an API, which can be used by individuals and open source projects: https://newsapi.org/s/national-geographic-api
They require an attribution link for their contents, which is reasonable and actually a desired outcome for RSS-Bridge as well. Generally, their terms sound reasonable to me. I don't have time to go further into it, but it sure looks like a feasible task to make a Bridge using their API (which is not limited to National Geographic it seems).
I'm actually impressed :open_mouth:
All the content that is displayed on their frontpage is embedded in the HTML as a JavaScript array/dictionary. Maybe it could be scrapped with a regex?
You are right, it does contain the JSON data. Not sure how I missed that before. I went ahead and made a small bridge from the contents I could find, see #1065. Let me know if this is what you wanted. There are other endpoints from which contents can possibly be extracted (like the one I linked in the PR).
Thanks! Looks good to me.
About other endpoints, I think people would most be interested in getting a feed off articles published in the magazine. https://www.nationalgeographic.com/magazine/
I changed the bridge to build a feed off articles in the magazine. Please take a look. How about including full articles? Currently the items in the feed have no contents, because there is no content on the original page. Technically it's possible to collect each article, but that take extra time on each request. Let me know what you think about that.
IMHO, there should still be a "latest stories" bridge. That's where most of the articles and daily news are posted. Built off https://www.nationalgeographic.com/latest-stories/
But it would be nice to have an additional "magazine only" bridge, for people who are interested only in the big stories.
I don't know if this requires two separate bridges?
About the the full articles, I poked around with the web inspector and it seems doable, only the images would have to extracted from the
Thanks for the feedback. I'll see if I can find some time this week to get it done.
I don't know if this requires two separate bridges?
It's doable in a single bridge, using contexts: https://github.com/RSS-Bridge/rss-bridge/wiki/const-PARAMETERS#level-1---context
About the the full articles, I poked around with the web inspector and it seems doable, only the images would have to extracted from the tags and rewritten as so they work in RSS readers. Not sure how much of a hassle that is, but this is great as is.
I suppose you mean images have relative links, right? (haven't checked yet)
This is easily solvable, using defaultLinkTo
.
I've added most features. You can now select the topic from a drop-down list and choose to include the full article as well (which can take a while and may not work if the timeout is set too low on your server). Images in the article are not included, however.
Also, there is no time stamp included in the raw data, so feeds will have to rely on titles.
Let me know if this now works for you.
Thanks. I just tried it out and works perfect. I'll let you know in a few days if there were any issues.
I presume they load images with javascript? Bummer.
I presume they load images with javascript?
To be honest I haven't checked yet. Lead images are simply provided in the JSON data. For full articles the current filter only covers text. I'll take another look, maybe images can be extracted the same way.
That was easier than I thought. Try the latest version, it includes images for full articles.
Thanks, I'll try it.
One thing that I noticed is that I'm getting duplicated articles in my RSS reader (Feedbin). Are the uid's on the articles changing when they update the page? I've tried commenting out the uid
assignment line so it relies on uri's to see if it makes difference.
I can confirm I'm no longer getting duplicates after I removed the following line:
$item['uid'] = $story['id'];
Otherwise, it's working great. I appreciate it a lot.
Some "Maybe/Someday" things that I wanted to write down for reference:
Great, I'm glad this is working for you!
I removed the uid and included image captions. What do you mean with "hero" images and carousels?
There are carousels mentioned in the JSON data, but from what I can tell they are placed below the contents and not above - maybe I'm looking at the wrong contents. It would be great if you could share a screen shot to illustrate what you mean.
Hero image (circled red) https://www.nationalgeographic.com/animals/2019/03/leopards-coexist-hindu-community-india/
also see: https://www.nationalgeographic.com/environment/2019/03/sabetta-yamal-largest-gas-field/
Hero carousel https://www.nationalgeographic.com/travel/lists/food-and-drink/worlds-best-food-cities/
Carousel in article https://www.nationalgeographic.com/environment/2019/03/whale-dies-88-pounds-plastic-philippines/
Thanks for the screenshots. Hero images were already included as enclosures. I just added support for hero carousels at top (added to enclosures) and in the article (added to contents).
Find the latest version at https://github.com/RSS-Bridge/rss-bridge/pull/1065
Does that work for you?
Thank you. I don’t think any RSS reader displays enclosures, so hero images and carousels should probably go directly into the content (top) with their corresponding captions.
This was added, so I'm going to merge this now. Please open a new issue if further changes are necessary.
Bridge request
Sadly, National Geographic doesn't have an RSS feed anymore. The bridge should get the most recent articles published on National Geographic.
Also, would be nice to specify a category you're interested. ie. "magazine" only.
General information
Host URI for the bridge (i.e.
https://github.com
): https://www.nationalgeographic.comWhich information would you like to see?
Get a feed of the most recent articles published on National Geographic.
Title Lead image Description
Which of the following parameters do you expect?
Options