davidmcclure / open-syllabus-project

What can be learned from 1M+ college course syllabi? (OLD)
Apache License 2.0
197 stars 13 forks source link

Query by URL #12

Open afandian opened 8 years ago

afandian commented 8 years ago

Would you consider an index on url in the API? e.g. http://explorer.opensyllabusproject.org/api/ranks?url=jstor.org. I think it would just be a question of altering materialize_ranking in:

https://github.com/davidmcclure/open-syllabus-project/blob/master/osp/citations/models/text_index.py#L147

Specifically, it would be nice to query substring so I can look for all data for a particular publisher by domain (although I'm aware that not every item has this metadata).

I'm asking because I'm working on Crossref Event Data, http://eventdata.crossref.org and we'd love to include OSP links for articles.

Joe

davidmcclure commented 8 years ago

Hey @afandian,

If you just want to get results for JSTOR articles, you can do:

ranks?corpus=jstor

We could conceivably make it possible to query against the source URLs for the syllabi, though we'd need to run a bit of new wiring in the ingest flow to get that data into Elasticsearch.

afandian commented 8 years ago

Thanks! JSTOR is only an example (and not a very good one at that). I have a list of about 4,000 domains that correspond to publishers' author pages that I'd like to query, so there isn't a mapping of domain to 'corpus'.

It would be great to have some kind of query like this (or alternatively a periodic bulk dump or something). If you're interested in Crossref Event Data, I'm jwass@crossref.org