RSS-Bridge / rss-bridge

The RSS feed for websites missing it
https://rss-bridge.org/bridge01/
The Unlicense
7.34k stars 1.04k forks source link

Arte+7 empty again #524

Closed mro closed 7 years ago

mro commented 7 years ago

similar to #244 ? Rev c375ddd6ab5ec7a6

Can somebody confirm?

mro commented 7 years ago

looks they did a comprehensive relaunch, grr.

mitsukarenai commented 7 years ago

Yeah they rolled out a new website, the URL schemes aren't the same, and I don't see the latest available videos on each category page, just collections, playlists, series... that's neat for the consumer, but really unfortunate for us :( Moreover, while subcategories list all available videos in what seems descending order, there are way too much subcategories and they barely list just name, url, thumbnail and duration...

This is not just a loose screw, we are facing here a complete bridge rewrite if it's even possible to retrieve relevant data to make a feed. That's way beyond my skill :(

mro commented 7 years ago

Indeed. They're not very keen on OpenData.

An approach may be, to just take today +1 / -7 days and grab http://www.arte.tv/de/guide/ and http://www.arte.tv/guide/api/api/program/de/scheduled/17-05-02 respectively.

Frenzie commented 7 years ago

The website is just flat-out worse. If I search for a program, such as Karambolage, it tells me neither when it was broadcast nor the blurb. I strongly disagree that this is "neat for the consumer" as I can't easily discover any of Arte's awesome programming through their website anymore. There's nothing to be found but a bunch of stupid, useless thumbnails.

/rant

mro commented 7 years ago

well, visitors weren't stakeholders I guess. I bet the term coffeetable was used a lot in the marketing meetings.

logmanoriginal commented 7 years ago

Just had a look :face_with_head_bandage:

They certainly don't go for simplicity... and of course no feeds, because that would be too easy.

They provide a quite extensive list of categories and related shows on the main page though without much details http://www.arte.tv/fr/

I extracted the JSON data and beautified it here

Maybe we can extract feeds from that?

Example:

{
"id" : "5927e2ed96403",
"kind" : "SHOW",
"programId" : "047867-001-A",
"language" : "fr",
"url" : "http:\u002F\u002Fwww.arte.tv\u002Ffr\u002Fvideos\u002F047867-001-A\u002Ffemme-de-viking-1-2",
"title" : "Femme de Viking (1\u002F2)",
"subtitle" : "La fuite de Sigrun",
"images" : [...],
"publicationBegin" : "2017-05-26T08:10:00Z",
"publicationEnd" : "2017-06-01T03:00:00Z",
"markings" : [],
"geoblocking" : null,
"creationDate" : "2017-05-26T08:10:21Z",
"lastModified" : "2017-05-26T08:30:52Z",
"stickers" : [],
"warning" : null,
"duration" : 52,
"childrenCount" : null
}

If I search for a program, such as Karambolage, it tells me neither when it was broadcast nor the blurb.

http://sites.arte.tv/karambolage/fr/voir-et-revoir-les-emissions-karambolage

Not sure how I got there but they freaking use separate domains for certain programs :confused:

Frenzie commented 7 years ago

It's better in that it at least displays the date, although without the title and blurb/subtitle it's still meh. I would like to give them a shout-out for using proper pages instead of endlessly scrolling ones like on npo.nl, which makes it completely impossible to find a broadcast from five or even one year(s) ago. The only exception there is when the series has a dedicated website with an overview such as here). That reminds me that I wanted to look into writing an NPO bridge. :-P

Anyway, it would certainly help if I could somehow manage to end up on the kind of page you gave me without bookmarking it. :rofl:

PS For my use case making a feed out of those kinds of pages (however you may find them) would be more useful than a feed that gives you everything.

mro commented 7 years ago

@Frenzie what's you usecase?

mine is using the feed as a news feed so I can skim the headlines of past programs and grab them via youtube-dl or MediathekView in case. High volume isn't much of an issue in my case. A category-blacklist would be nice, however.

So I'm for everything but within a time window.

Frenzie commented 7 years ago

@mro Basically the same except I'd never heard of MediathekView and volume's definitely an issue for me. I might glance at the website occasionally to see if there's anything of interest but I don't want to see everything all the time except for a few very specific programs (i.e., Karambolage, Le Dessous des cartes and I guess I'd consider keeping abreast of new releases of that NDR show Xenius even if I have no interest in watching them all โ€” German Arte also had an interesting series on rivers around the world).

tl;dr I never used the existing ArteBridge functionality because it didn't serve my needs. But it didn't bother me enough to implement it myself. Just putting it out there if someone decides to rewrite it from scratch. ;-)

PS A program like MediathekView sounds somewhat redundant for, e.g., BR where all the programs readily provide download links.

PPS It's a pity that the JSON above doesn't indicate that the French version of that Femme de Viking program is dubbed from the original Die Frauen der Wikinger without loading yet another JSON.

logmanoriginal commented 7 years ago

firefox_screenshot_2017-05-29t18-16-46 254z

You never guess where this leads to... spoiler ๐Ÿคฃ

They should link here instead. Actually this is more accurate ๐Ÿ˜†


Seriously now, we can make something out of the pages. As long as the data is available (and they don't guard against bots) anything is possible. Also the API request mentioned by @mro is very interesting as it doesn't require registration and uses less bandwidth.

An approach may be, to just take today +1 / -7 days and grab http://www.arte.tv/de/guide/ and http://www.arte.tv/guide/api/api/program/de/scheduled/17-05-02 respectively.

How did you figure that out? ๐Ÿ˜ฎ

Frenzie commented 7 years ago

Although that link is broken, a quick search (using a search engine to search on their siteโ€ฆ) shows that they do in fact have newsfeeds, but (I think) only in French and German. However, in those languages they don't actually link to anything from the main page, just the social media!

http://www.arte.tv/sites/services/flux-rss/ http://www.arte.tv/sites/de/services/rss-feeds/

Not all of them are functional, but the basic +7 seems to be.

http://www.arte.tv/papi/tvguide-flow/feeds/videos/fr.xml?type=ARTE_PLUS_SEVEN&player=true http://www.arte.tv/papi/tvguide-flow/feeds/videos/de.xml?type=ARTE_PLUS_SEVEN&player=true

[Edit: when you remove the &player=true it also gives you download links for all the video files pertaining to the particular language you're looking at.]

[Edit 2: when you remove all arguments you get an enormous list of programs going back to 2015 rather than just a few months โ€” probably best not to do that wrt drawing attention. :-P]

Edit: also, they seem to have a secret API?

Quoting from https://www.drupal.org/project/arte_opa

The documentation of this API is available here : https://api.arte.tv/api/oauth/user/documentation

Configuration After requesting access keys from ARTE, you can configure OPA access settings at /admin/config/services/opa/config.

It doesn't specify how one would go about requesting access keys.

teromene commented 7 years ago

The API command to get the list of the videos is probably https://api.arte.tv/api/opa/v3/videos?sort=broadcastBegin&limit=10 The params you can give to the API are :

In order to access the API, you just need to add the header Authorization: Bearer Nzc1Yjc1ZjJkYjk1NWFhN2I2MWEwMmRlMzAzNjI5NmU3NWU3ODg4ODJjOWMxNTMxYzEzZGRjYjg2ZGE4MmIwOA I have also seen the token Bearer MWZmZjk5NjE1ODgxM2E0MTI2NzY4MzQ5MTZkOWVkYTA1M2U4YjM3NDM2MjEwMDllODRhMjIzZjQwNjBiNGYxYw but I think it is used only to access menu elements generation (internally called EmacEndpoint API)

For filtering categories, you need to add &category.code= plus the category (same for subcategories). It is possible to chain categories by separating them with commas. Category IDs :

To get the subcategories, it is possible to fetch https://api.arte.tv/api/opa/v3/subcategories?category.code= + category code

I don't really use this bridge, so I think It would be better to let someone that actually used it implement the function you need. If you need any info on the APIs, just ask ๐Ÿ˜„

teromene commented 7 years ago

Should be fixed by cba65d6d087f14b2ca2fda995745ac8f5f79310d.