Any ability to add more sources?

SSj-Saturn commented 8 months ago

This is fantastic, and thank you for making this.

On the configuration page, would it be possible to add more sources (eg. TVDB), that allows us to add the endpoint, relevant, server types, and API etc?

C5H12O5 commented 8 months ago

Thank you for your support. The easiest way to add additional data sources is by adding a JSON configuration file in the 'scrapeflows' folder, following the format of the existing implemented data sources. Implementing such functionality on the web configuration page seems overly complex and unnecessary for a simple plugin like this.

TheMizuchi commented 6 months ago

Hey, i've try understand what i need to change on json files to add a new source but... I don't quite understand some points:

the first 'http' component, you request a page or a media ?
after that, what do you collect? I mean... can you explain this to me? "collect": { "source": "metadata", "into": { "ids": "['xp_texts', './results//id']" } }
For all collected elements... what is "['xp_text', './title']" ?

Btw it's just super cool that you made this plugin ^^

C5H12O5 commented 6 months ago

@TheMizuchi

The first key of each child node under the steps object in the JSON file is the name of function to use, which corresponds to the Python files under the /scraper/functions path one by one.

key	file
http	/scraper/functions/request.py
collect	/scraper/functions/collect.py
loop	/scraper/functions/loop.py
retval	/scraper/functions/retval.py

So, an http key means executing the request.py function once, a collect key means executing the collect.py function once, and so on.

The request.py only does one thing, which is to send a request and put the response into the context.

The collect.py is a little more complicated. It will get the specified source from the context, and then do some processing on it, finally put back the result into the context. Expressions like ['xp_texts', './results//id'] is the processing part, which means using XPath expression ./results//id to find all matching subelements. (xp_ means XPath, re_ means regular expression)

For example:

source: context["metadata"] = { "results": [ { "id": 1, ... }, { "id": 2, ... } ] }
expression: { "collect": { "source": "metadata", "into": { "ids": "['xp_texts', './results//id']" } } }
result: context["ids"] = [ "1", "2" ]

PS: I use XPath to process JSON by converting the JSON object into XML format since Python doesn't natively support JSONPath.

C5H12O5 / syno-videoinfo-plugin

Any ability to add more sources? #2