add-ons / plugin.video.vrt.nu

Kodi add-on to watch content from VRT MAX
https://www.facebook.com/groups/kodivlaanderen
GNU General Public License v3.0
110 stars 20 forks source link

Use VRT NU API instead of scraping #57

Closed pietje666 closed 5 years ago

pietje666 commented 5 years ago

As the title says, it would be alot nicer to use an API instead of scraping from the VRT.NU site. It would make the addon more stable and most likely be faster.

The problem is, we (or atleast i) do not know which API, nor how to call it. So any help is welcome!

mediaminister commented 5 years ago

I did a little research.

I found out VRT uses some kind of Elasticsearch. I succeeded to get an error message that shows some info about the backend https://search.vrt.be/search?facets[programUrl]=test&facets[programUrl]=test

I'm not sure which Elasticsearch version or variant VRT uses. I looked at the Elasticsearch Uri Search documentation and found out that some parameters work but others don't.

There are two endpoints, a suggest api to get short program info: https://search.vrt.be/suggest?q=waes and a search api to get all program info https://search.vrt.be/search?q=waes

I rewrote the show_videos function using this api as an example: https://github.com/mediaminister/plugin.video.vrt.nu/blob/c4ff617335ddb2322fcc9c1c7f1d57171166153a/plugin.video.vrt.nu/resources/lib/vrtplayer/vrtplayer.py#L95

The possibilities with this api seem endless, but I have to examine this further.

pietje666 commented 5 years ago

i wonder how they do the categories / a-z

mediaminister commented 5 years ago

I get 241 items with the following request: https://search.vrt.be/suggest?facets[transcodingStatus]=AVAILABLE

I think this corresponds to the a-z list.

I have not yet found a good query for the categories list.

pietje666 commented 5 years ago

Looks good for A-Z

pietje666 commented 5 years ago

for the categories tis seems to be working, i guess the categories are hardcoded => https://search.vrt.be/search?size=150&facets[categories]=docu needs to be mixed with the available facet though. And then we need to select Program and distinct. Any elastic search experts here?

mediaminister commented 5 years ago

Other solution: https://search.vrt.be/suggest?&facets[categories]=docu And then: https://search.vrt.be/search?size=150&facets[programName]=ademloos or https://search.vrt.be/search?size=150&facets[programUrl]=//www.vrt.be/vrtnu/a-z/ademloos/

I just don't know yet how to get the initial categories list.

pietje666 commented 5 years ago

i think it almost never changes we can hardcode it

pietje666 commented 5 years ago

i want to do something this https://search.vrt.be/search?size=150&facets[categories]=docu&select=program&distinct i think it must be possible in some way.

mediaminister commented 5 years ago

I tried a lot of different things but did not find a method to get a list of categories. I think this public api is limited. Take a look at this document: https://www.vrt.be/etc/clientlibs/components/search.js

I publish my experiments with this api on: https://github.com/mediaminister/plugin.video.vrt.nu/tree/api

mediaminister commented 5 years ago

Summary of my api experiment (https://github.com/mediaminister/plugin.video.vrt.nu/tree/api):

I rewrote the following functions using the api

I also extended the metadatacreator class to provide more info in videoplayer view: episode number, season number, tvshowtitle, etc...

I also used the screenshot api (https://vrtnu-api.vrt.be/screenshots/een.jpg) to provide a picture with the livestreams, but the picture doesn't refresh, so this is useless for now.

Besides the categories list and urltostream class, everything works now through the api. I have currently no more spare time to refactor my code using vrtapihelper class. Maybe in the next weeks, but feel free to rewrite my code. Most of the work has already been done.

I noticed publicationId and videoId is also present in this api, so we don't need web scraping anymore in the urltostream class. Maybe we can finalize this issue by the end of this month.

pietje666 commented 5 years ago

I will take what i need thanks, will also refactor code to use season facet filtering right now i do it manually in the code but indeed it would be better to do it directly in the searchstring!

but those fixes are for a later stage right now i will already push a new version!

mediaminister commented 5 years ago

I luckily found some time to create a pull request in which I have implemented all the functions mentioned in this issue: https://github.com/pietje666/plugin.video.vrt.nu/pull/64

dagwieers commented 5 years ago

@mediaminister I just tested your new release and it works fine. I like the new addition of Most recent, but I was wondering if it would be possible to also add a Channels section where programs are listed by channel (Ketnet Junior, Ketnet, Sporza, etc...).

That would be nice for adding to the favourites in the profile of our children.

mediaminister commented 5 years ago

This is possible, for instance: 50 most recent episodes from ketnet: https://vrtnu-api.vrt.be/search?i=video&size=50&facets[transcodingStatus]=AVAILABLE&facets[brands]=ketnet

I just have to find out how I can dynamically get all vrt brand names with one http request. I didn't find an efficient way of getting these brand names from the api. So, I might have to scrape them from the vrt nu website.

pietje666 commented 5 years ago

more features = more maintenance :( , first thing to do is actually make the code python 2-3 compatible (otherwise no more updates from kodi team). I have done some work already to establish that but its not working yet. Maybe first ill review your pull request and then perform the conversion again

mediaminister commented 5 years ago

I agree, I'm not coding at the moment. Maybe we can open another issue to discuss the difficulties with python 2-3 compatibility. If I'm not mistaken python 2 is supported until the end of this year.

pietje666 commented 5 years ago

https://github.com/xbmc/repo-plugins/pull/2331 => this add-on isn't Python 3 compatible. Future updates have to be compatible with Python 3.

dagwieers commented 5 years ago

@pietje666 I have some standard Travis integration for Kodi addons, can I add it to this project?

It does include PEP8-related stuff, so it would require a general cleanup as well. Unfortunately, we can't do any proper code-testing since the xbmc imports fail anyhow. I would prefer this PR is merged before this gets added.

dagwieers commented 5 years ago

BTW I have a lot of experience with python2/3 issues, and it's not that hard to make it work.

pietje666 commented 5 years ago

@pietje666 I have some standard Travis integration for Kodi addons, can I add it to this project?

It does include PEP8-related stuff, so it would require a general cleanup as well. Unfortunately, we can't do any proper code-testing since the xbmc imports fail anyhow. I would prefer this PR is merged before this gets added.

Everything can be unit tested except the kodiwrapper.py / addon.py class. About the Pep 8 fine by me as long as it does not enforce the max line length rule.

dagwieers commented 5 years ago

About the Pep 8 fine by me as long as it does not enforce the max line length rule.

Perfect, I have it ready.

pietje666 commented 5 years ago

The script will execute automatically if something gets pushed/pullrequested ? Or how does it work?

dagwieers commented 5 years ago

You need to enable your repository on travis-ci.org, and from that moment any PR will show the CI test status. At the moment it will likely fail because not all pylint tests were fixed. If you enable it, I will make a PR that will become green because it fixes those.

dagwieers commented 5 years ago

@mediaminister I started the document the VRT.NU API from the comments in this PR, you can find it here: https://github.com/pietje666/plugin.video.vrt.nu/wiki/VRT.NU-API

I would appreciate if you can amend this Wiki page with new findings and improvements. I am sure it helps other people to understand what our code is doing in the backend.

mediaminister commented 5 years ago

Okay, thanks for setting up a wiki page. I'll add more information soon.

EDIT: I can't edit the wiki, so I'm placing the explanation of some parameters here:

VRT API parameters

(first limited list, there are definitely more parameters, but undocumented for now. Check https://www.vrt.be/etc/clientlibs/components/search.js to investigate)

i value: video or corporate index, select which index to search, to search VRT NU this value is always video, to search corporate vrt.be website, use corporate

q query, keywords for your search query

size value: integer between 1 and 150 size, number of results per page, maximum value is 150. Defaults to 10.

from value: integer between 1 and ... from, the starting index of the hits to return, combined with parameter size you can browse all results. Defaults to 0.

order value: desc or asc order: sort search results descending or ascending. Defaults to desc

facets[key]=[values] value: keys and values from the api facets, narrow down results by providing keys and values, it's possible combine multiple "facets" in one query examples: facets[brands]=[radio1,stubru] facets[brands]=[een]&facets[formattedBroadcastShortDate]=18/02

mediaminister commented 5 years ago

While looking for possibilities to phase out web scraping completely, I found something new: You can use dot notation in facets to navigate a level deeper in the json data: https://vrtnu-api.vrt.be/search?i=video&size=1&facets[programTags.title]=Kinderen

mediaminister commented 5 years ago

Phasing out web scraping completely is not around the corner. I found out that the category images at https://www.vrt.be/vrtnu/categorieen/ are not available through https://vrtnu-api.vrt.be/search?i=video

dagwieers commented 5 years ago

But the category images are an optional feature, so as long as the addon does not fail if this no longer works, I think we are fine.

dagwieers commented 5 years ago

@pietje666 Can we open the Wiki for outside access, or at least give @mediaminister the right privileges to update the Wiki ?

dagwieers commented 5 years ago

@mediaminister What I was looking for in the interface was something equivalent to this:

This could allow to list categories, channels/brands, etc...

mediaminister commented 5 years ago

The api frontend is not Elasticsearch, VRT wrote it's own limited frontend. Check this error message: https://vrtnu-api.vrt.be/search?i=video&facets[x][x]

I can get a list of categories by running through 500 video items, but it's slow and needs a local cache and it's useless without the images.

Maybe rewriting the web scraping code to make it more failsafe is a better idea.

dagwieers commented 5 years ago

Personally I wouldn't mind if we hard-coded the known list of categories, and translate them, and use this:

  1. as a fallback mechanism for when the scraping fails
  2. for translating to other languages

PS It would be nice if the translation-mechanism in Kodi would not depend on numbers, but on strings instead, it would make our code a lot more readable.

dagwieers commented 5 years ago

I wrote a fallback mechanism for categories and included a web-scraping test to find discrepancies between online and internal representation. You can find it in #161

The only thing that remains requiring web-scraping is the livestream implementation.

dagwieers commented 5 years ago

Apart from #161 I don't think there is a lot we can do.