Closed PeterCiuffetti closed 3 years ago
Estimate this is about 4 days of work, most of this going into developing a local store of info that can be consulted by different plugins that need access to the org and collection information on the other side of this API. And to configure the distribution capability into the solutions.
The API is ready for you, @PeterCiuffetti :)
You have three available filters:
Example request:
https://policycommons.net/api/collections/?sourcing=coherencebot&bucket=extra
{
"count": 62,
"next": "https://policycommons.net/api/collections/?bucket=extra&page=2&sourcing=coherencebot",
"previous": null,
"results": [
{
"id": 1206,
"title": "Publications",
"slug": "publications",
"url": "https://oceana.org/publications/",
"sourcing": "coherencebot",
"bucket": "extra",
"org": {
"slug": "oceana",
"name": "Oceana",
"acronym": null
}
},
{
"id": 1207,
"title": "Reports",
"slug": "reports",
"url": "https://eciu.net/analysis/reports",
"sourcing": "coherencebot",
"bucket": "extra",
"org": {
"slug": "energy-and-climate-intelligence-unit",
"name": "Energy and Climate Intelligence Unit",
"acronym": "ECIU"
}
},
[...]
The authentication is done via x-api-key
in the header, as per usual.
I've organised all of the CoherenceBot collections already ingested so far into two buckets:
You can retrieve them by using a combination of e.g. ?sourcing=coherencebot&bucket=coherencebot-batch-1
. Mind you, this API endpoint returns all collections in the Commons, not only the ones for CoherenceBot. Hence, it's important to specify the sourcing
attribute correctly.
I believe that a good starting point is the extra
bucket. It has 62 collections today, none of which has already been ingested by CoherenceBot, but all of them have been properly connected to an organisation and vetted by either Toby or myself.
And finally, the admin, should you need to use it, is available here:
https://policycommons.net/admin/artifacts/collection/
You can use similar filters on the right-hand side.
Just playing with this a bit to familiarize myself with it...
this request
curl -v -H 'x-api-key: ...key...' 'https://policycommons.net/api/collections/?sourcing=coherencebot&bucket=extra'
Indicated count:74 but returned only 10. Are there paginator params to get the next page(s)?
...seeing now the "next": and "previous": URLs in the header. So, previous question is answered.
This is now finished, tested and deployed on all three clusters.
The solution uses a custom injector class called FeedInjector.
It is called by crontab (local for user hadoop). This is currently set up to run every hour at the top of the hour.
It uses the params ?sourcing=coherencebot&cluster=
Overview
Assumptions
Complications
Proposed Implementation