getContent() returns the information we index for feature layers

tomwayson commented 4 years ago

Define IHubDataset that extends IHubContent w properties like recordCount, fields, geometryType, layer?, server?, etc, as needed by the Hub app, and have getContent() return that when the content's hubType is dataset.

ajturner commented 4 years ago

Should this method provide an option to include or not include these additional metadata? When these extended properties are not cached via harvesting (e.g. public datasets) then there could be significant XHR to fetch these from different endpoints.

Relatedly, should the extended properties be in a type-specific property to make it clear they're type specific, maybe optional, and keep the primary object readable/coherent. For example dataset.extended.fields

tomwayson commented 4 years ago

Some things we discussed offline:

we won't used a property bag like content.extended - we'll just add the properties directly to content (i.e. IHubDataset extends IHubContent)
we'll use an request option like include (or exclude, depending on what the default behavior is)
should default behavior be to perform whatever additional async fetches are needed to return a complete dataset and users would have to opt-out, or to not perform any (opt-in)?

ajturner commented 4 years ago

I’d like to hear @mikeringrose thoughts on default includes.

I recall seeing an update to Urban GraphQL API only default return an ID unless fields were explicitly include.

However, that would be onerous request to include all common Resources or Content metadata. We don’t want to require knowledge of the data store implementation (what’s in Portal vs Server) - but is it sensible to say “by default the model includes [...] attributes”

tomwayson commented 4 years ago

To me this option is not going to specify which fields are returned, but rather which additional xhrs are going to be made if needed (i.e. the content is coming from the portal API instead of the Hub API). I don't think we should derive the list of xhrs from a list of fields (i.e. "consumer wants advancedQueryCapabilities, therefore I need to fetch the layer info"). In other words getContent() should always return all the properties it has fetched, and what the consumer will have control over is what additional fetches should be made if needed.

To me that implies a better name than include would be ensure. For example, if a consumer specifies ensure: ['service', 'recordCount']:

when fetching from portal (i.e. private item, or in enterprise), additional xhrs would be made to get the service info (fields, advancedQueryCapabilities, maxRecordCount, etc) and query the layer's recordCount, and the corresponding properties would be populated
when fetching from the Hub API, no additional xhrs are made, the above properties would be populated b/c they're returned from the API. Also, properties like statistics and server are populated b/c they are returned by the API too.

tomwayson commented 4 years ago

Also, there are some enrichments like statistics that we do not pro-actively fetch client-side. In other words, when you visit a the dataset route in enterprise or for a private item, we do not go and fetch the stats for every field before rendering the template.

This makes me wonder if ought to just provide additional async functions like getRecordCount(), getLayerInfo() getFieldStatistics()etc, for clients to fetch the additional data as needed, and perhaps some fns to enrich the content object w/ that data (i.e. setContentLayerInfo(content, layerInfo)).

At least we can start w/ those fns, and decide later if getContent() needs to take an option that would allow consumers to indicate that they want those fetched up front.

ajturner commented 4 years ago

@tomwayson I really dislike and want to actively avoid adding "GIS infrastructure implementation" to these methods. What the heck is LayerInfo or service?

Maybe a middle ground is aligning these external metadata sources with their semantic and logical utility. For example, can getCapabilities() wrap up the various layer info, etc (still jargony, but a little more meaningful)?

I agree that we can provide these individual focused request methods for external use; but the primary use case would be the single getContent(..) that would internally also use these methods depending on requested metadata.

For include vs. ensure I think that's just an implementation detail difference. If the database is normalized or denormalized is irrelevant to the interface. include is a more common term in this scenario (and less a digestive health beverage).

For getField Statistics() (or getStatistics() to be more comprehensive that could return record count?) this should similarly allow for an explicit array of which attributes to include. There may be many times the UI only needs a single attribute, or maybe a few but rarely all of them.

bcamper commented 4 years ago

statistics is a good example where it's not always practical to fetch all results at once. Looking through the approach and lessons learned from hub-indexer (relevant files in https://github.com/ArcGIS/hub-indexer/tree/master/packages/duke/enrich): for numeric and date fields we're doing a single stats call that batches all fields + stats of that field type, but then one call per field for string fields (explanation here, and Peter and I ran into the same issue in filter prototyping), e.g.

1 call for all numerics
1 call for all dates
N calls for N strings

Further, the string stats calls are made sequentially (I assume to avoid overwhelming the server, perhaps someone remembers). So whether it's an includes option or @tomwayson's decorator function, there would be cases where this process can take several seconds and it's inadvisable for UI. Product-wise, we can work around this as needed. For example, for the filtering UI, for the list of attributes, we will probably just degrade and show less field metadata when cached stats are not available. But when a specific field is then selected and rendered as a filter UI, we can request the stats for that single field (to render a slider, histogram, checkboxes, etc.).

In any case, as part of the filtering work, it would be great to have @JoshCrozier involved in this, and consider extracting some of the relevant hub-indexer stats code linked above into hub.js methods for use on the client as well.

tomwayson commented 4 years ago

want to actively avoid adding "GIS infrastructure implementation" to these methods

I think that's just an implementation detail difference (naming things)

can getCapabilities() wrap up the various layer info, etc (still jargony, but a little more meaningful)?

or less meaningful depending on the audience, for example myself.

sounds like we're in agreement on these ideas:

if we've got the data, return it (i.e. if we fetched from the Hub API and properties like fields, etc exist)
provide async fns to fetch the additional props as needed
provide an API (via options) to have getContent() proactively fetch those things if they weren't returned (i.e. by the Hub API).

sounds like we're not exactly in agreement in what to name those fns and options but that's software

we can provide these individual focused request methods for external use; but the primary use case would be the single getContent(..)

I agree, but that's the higher-order, and harder use case. I suggest we start by building the stand-alone fns themselves. That will help inform us on what the getContent() option should look like.

Specifically, I suggest we start w/ #353, which is not specific to datasets, but should help illuminate the pattern. Then we can build into getCapabilities(). While I'm working on that @JoshCrozier can start to come at it from the statistics side of things.

tomwayson commented 4 years ago

In #399 I set up the infrastructure to make additional fetches for content properties and gracefully capture any errors. We can use that for the additional properties needed for datasets.

We'll use #401 to flesh out:

making the right additional fetches based on the item type (i.e. web maps fetch the item data, map services fetch the server info, etc)

We'll use #538 to flesh out:

providing an API to opt out of making those fetches

I've scoped this issue down to focus on returning what is needed for a feature layer. We can add additional issues for the other types of content that hub considers a dataset.

tomwayson commented 3 years ago

We should reference the functions outlined https://devtopia.esri.com/dc/hub/issues/1187 to understand what composer is doing server-side.

Esri / hub.js

getContent() returns the information we index for feature layers #322