dcppc / centillion

Centillion is the Data Commons search engine. One centillion is 3.03 log-times better than a googol.
https://dcppc.github.io/centillion
MIT License
12 stars 3 forks source link

Index Disqus/Hypothesis comments #3

Closed charlesreid1 closed 6 years ago

charlesreid1 commented 6 years ago

Source of idea: CTB (dib-lab/copper#188)

Disqus API: https://disqus.com/api/docs/threads/list/

Hypothesis API: http://h.readthedocs.io/en/latest/api/

Also see concern expressed in #1 re how to better modularize for different types of document collections.

charlesreid1 commented 6 years ago

Updates:

Hypothesis API is a bit confusing. I would expect that you could pass it a URL and get back the annotations, but it is not that straightforward. There is an API client library (python-hypothesis) that's recently been brought up to date with Python 3, so hopefully that will help.

Disqus API is sensible. Offers an access token API key and allows creating OAuth applications. Only problem is API calls are rate limited, around 1k/hr. Not yet sure if that will be a problem for us. Planning to create disqus_util.py to encapsulate the API, similar to what we've done with Google Drive and Groups.io APIs.

charlesreid1 commented 6 years ago

nice disqus API examples in repo https://github.com/charlesreid1/DISQUS-API-Recipes/blob/master/widgets/php/latest_comments.php

nice example: pass in forum and thread, get comments out: https://github.com/charlesreid1/DISQUS-API-Recipes/blob/master/widgets/php/latest_comments.php

charlesreid1 commented 6 years ago

Have registered a disqus application and obtained API keys. Am now able to get a list of all threads in the dcppc-internal discussion forum. Good progress so far.

ctb commented 6 years ago

On Tue, Aug 21, 2018 at 08:07:22PM +0000, Chaz Reid wrote:

Hypothesis API is a bit confusing. I would expect that you could pass it a URL and get back the annotations, but it is not that straightforward. There is an API client library (python-hypothesis) that's recently been brought up to date with Python 3, so hopefully that will help.

we also have contact with the hypothesis folk directly, so at the very least we can send them feedback saying "why isn't this more simple?"

charlesreid1 commented 6 years ago

We're successfully scraping all 0 comments on the DCPPC internal site, should have Disqus comments added to the search index soon. Then it's on to Hypothesis annotations.

charlesreid1 commented 6 years ago

Hypothesis API has been figured out - you can't say "here's a URL, give me all the annotations" but you can search across all annotations and pass a URL parameter.

@ctb are hypothesis annotations on private-www collected under a group, like disqus has dcppc-internal?

ctb commented 6 years ago

On Wed, Aug 22, 2018 at 02:37:11PM -0700, Chaz Reid wrote:

@ctb are hypothesis annotations on private-www collected under a group, like disqus has dcppc-internal?

no