Open g12mcgov opened 5 years ago
The Dataset model is intended to provide an abstraction to multiple data sources, i.e. to allow gs_quant to source data from other places than the Marquee API. For content, we should start with access to the underlying Marquee APIs e.g.
Note that in the Dataset example, the ID is to a datasource, not to an individual row. so in your example, it would probably map to a channel, not to a content item. let's get a couple more opinions as well
API class definitely the best for starting. Agree on the stream comments with datasets - although a Content() item just has to map to an actual piece of content with metadata; perhaps ContentChannel() becomes a first-class object. The question is also then where to functions like 'Get many content pieces' go, similar questions I would imagine for many other APIs:
Two key options as I see it:
Content.get_many()
- the typical API model. Convenient as stored on Content piece, blurs the boundary between data items and querying, similar to datasets.ContentQueryEngine
- similar to general server-side development, and assets with SecurityMaster
. Clearer item/query distinction, but extra classes and level of indirectionLet's discuss
@francisg77 @andyphillipsgs
Already added the abstraction layer you mentioned gs_quant.content
, which takes in a provider (in this case GsContentAPI
.
As for your other points, I prefer the first option, since it's also consistent with how datasets work currently in the API. Something like:
content = Content(channel='<some_channel>')
content = Content(assetId='<some_asset_id>')
...
# Default with no kwargs will just rely on the content-api doing a default lookup based on who you are.
content = Content()
Then, expose out a method(s) like:
content.get_many(offset=0, limit=10)
This way is nice too because helper class methods could be added to, like content.get_text()
to extract the raw text / abstract away the need for base64 decoding etc.
Only issue I see with this is that there's no need for a get-single-content method now, right? (i.e. content.get('<some_id>')
, but maybe that's not really an issue.
@francisg77 @andyphillipsgs
Made an MR with the described changes above in the Gitlab repo. You can find that MR here: https://gitlab.gs.com/marquee/analytics/gs_quant/merge_requests/254
Had to do it internally since I needed to generate the new Content types.
Describe the problem
Currently there is no programatic way of accessing Marquee Content through gs_quant. This is a feature proposal to add a content module for interacting with the new Marquee Content API (/v1/content).
Describe the solution you'd like
State of the world:
At the time of writing, there are two primary means of retrieving content via the Marquee Content API:
GET
GET
Eventually, the entire suite of endpoints will be implemented which will allow querying, searching, updating, and creation of content.
Proposed Solution:
Get Many Contents:
gs_quant should expose out a Content module for supporting the above endpoints.
Get a Single Content:
gs_quant should expose out a Content module for supporting the above endpoints.
All returned content will be of the form
ContentResponse
. A link to this object can be found here on the Marquee Developer Site.By default, all content is Base64 encoded along with the associated MimeType. This allows for transporting the content via JSON, given that we support many different content types (HTML, text, image, PDF, etc...).
A client of gs_quant using that content module might then do:
Describe alternatives you've considered
Currently bouncing between the following two implementation styles:
1) Declare a
Content()
object (as examples show above) that creates an instance of the class, for doing things like:2) Go the route of the
Dataset
model, where the code would look like this:Not really a fan of this approach for content as I think it's a little awkward / doesn't really provide a fluent API for querying/searching.
Are you willing to contribute Yes!
Additional context
N/A
@andyphillipsgs @francisg77 @bobbyalex83 @ScottWeinstein