Open SSj-Saturn opened 8 months ago
Thank you for your support. The easiest way to add additional data sources is by adding a JSON configuration file in the 'scrapeflows' folder, following the format of the existing implemented data sources. Implementing such functionality on the web configuration page seems overly complex and unnecessary for a simple plugin like this.
Hey, i've try understand what i need to change on json files to add a new source but... I don't quite understand some points:
"collect": { "source": "metadata", "into": { "ids": "['xp_texts', './results//id']" } }
"['xp_text', './title']"
?Btw it's just super cool that you made this plugin ^^
@TheMizuchi
The first key of each child node under the steps
object in the JSON file is the name of function to use, which corresponds to the Python files under the /scraper/functions
path one by one.
key | file |
---|---|
http | /scraper/functions/request.py |
collect | /scraper/functions/collect.py |
loop | /scraper/functions/loop.py |
retval | /scraper/functions/retval.py |
So, an http
key means executing the request.py
function once, a collect
key means executing the collect.py
function once, and so on.
The request.py
only does one thing, which is to send a request and put the response into the context.
The collect.py
is a little more complicated. It will get the specified source from the context, and then do some processing on it, finally put back the result into the context. Expressions like ['xp_texts', './results//id']
is the processing part, which means using XPath expression ./results//id
to find all matching subelements. (xp_
means XPath, re_
means regular expression)
For example:
context["metadata"] = { "results": [ { "id": 1, ... }, { "id": 2, ... } ] }
{ "collect": { "source": "metadata", "into": { "ids": "['xp_texts', './results//id']" } } }
context["ids"] = [ "1", "2" ]
PS: I use XPath to process JSON by converting the JSON object into XML format since Python doesn't natively support JSONPath.
This is fantastic, and thank you for making this.
On the configuration page, would it be possible to add more sources (eg. TVDB), that allows us to add the endpoint, relevant, server types, and API etc?