Open tomwayson opened 4 years ago
Should this method provide an option to include or not include these additional metadata? When these extended properties are not cached via harvesting (e.g. public datasets) then there could be significant XHR to fetch these from different endpoints.
Relatedly, should the extended properties be in a type-specific property to make it clear they're type specific, maybe optional, and keep the primary object readable/coherent. For example dataset.extended.fields
Some things we discussed offline:
content.extended
- we'll just add the properties directly to content
(i.e. IHubDataset extends IHubContent
)include
(or exclude
, depending on what the default behavior is)I’d like to hear @mikeringrose thoughts on default includes.
I recall seeing an update to Urban GraphQL API only default return an ID unless fields were explicitly include.
However, that would be onerous request to include all common Resources or Content metadata. We don’t want to require knowledge of the data store implementation (what’s in Portal vs Server) - but is it sensible to say “by default the model includes [...] attributes”
To me this option is not going to specify which fields are returned, but rather which additional xhrs are going to be made if needed (i.e. the content is coming from the portal API instead of the Hub API). I don't think we should derive the list of xhrs from a list of fields (i.e. "consumer wants advancedQueryCapabilities
, therefore I need to fetch the layer info"). In other words getContent()
should always return all the properties it has fetched, and what the consumer will have control over is what additional fetches should be made if needed.
To me that implies a better name than include
would be ensure
. For example, if a consumer specifies ensure: ['service', 'recordCount']
:
fields
, advancedQueryCapabilities
, maxRecordCount
, etc) and query the layer's recordCount
, and the corresponding properties would be populatedstatistics
and server
are populated b/c they are returned by the API too.Also, there are some enrichments like statistics
that we do not pro-actively fetch client-side. In other words, when you visit a the dataset route in enterprise or for a private item, we do not go and fetch the stats for every field before rendering the template.
This makes me wonder if ought to just provide additional async functions like getRecordCount()
, getLayerInfo()
getFieldStatistics()
etc, for clients to fetch the additional data as needed, and perhaps some fns to enrich the content object w/ that data (i.e. setContentLayerInfo(content, layerInfo)
).
At least we can start w/ those fns, and decide later if getContent()
needs to take an option that would allow consumers to indicate that they want those fetched up front.
@tomwayson I really dislike and want to actively avoid adding "GIS infrastructure implementation" to these methods. What the heck is LayerInfo
or service
?
Maybe a middle ground is aligning these external metadata sources with their semantic and logical utility. For example, can getCapabilities()
wrap up the various layer info, etc (still jargony, but a little more meaningful)?
I agree that we can provide these individual focused request methods for external use; but the primary use case would be the single getContent(..)
that would internally also use these methods depending on requested metadata.
For include
vs. ensure
I think that's just an implementation detail difference. If the database is normalized or denormalized is irrelevant to the interface. include
is a more common term in this scenario (and less a digestive health beverage).
For getField Statistics()
(or getStatistics()
to be more comprehensive that could return record count?) this should similarly allow for an explicit array of which attributes to include. There may be many times the UI only needs a single attribute, or maybe a few but rarely all of them.
statistics
is a good example where it's not always practical to fetch all results at once. Looking through the approach and lessons learned from hub-indexer (relevant files in https://github.com/ArcGIS/hub-indexer/tree/master/packages/duke/enrich): for numeric and date fields we're doing a single stats call that batches all fields + stats of that field type, but then one call per field for string fields (explanation here, and Peter and I ran into the same issue in filter prototyping), e.g.
Further, the string stats calls are made sequentially (I assume to avoid overwhelming the server, perhaps someone remembers). So whether it's an includes
option or @tomwayson's decorator function, there would be cases where this process can take several seconds and it's inadvisable for UI. Product-wise, we can work around this as needed. For example, for the filtering UI, for the list of attributes, we will probably just degrade and show less field metadata when cached stats are not available. But when a specific field is then selected and rendered as a filter UI, we can request the stats for that single field (to render a slider, histogram, checkboxes, etc.).
In any case, as part of the filtering work, it would be great to have @JoshCrozier involved in this, and consider extracting some of the relevant hub-indexer stats code linked above into hub.js methods for use on the client as well.
want to actively avoid adding "GIS infrastructure implementation" to these methods
I think that's just an implementation detail difference (naming things)
can
getCapabilities()
wrap up the various layer info, etc (still jargony, but a little more meaningful)?
or less meaningful depending on the audience, for example myself.
sounds like we're in agreement on these ideas:
fields
, etc exist)options
) to have getContent()
proactively fetch those things if they weren't returned (i.e. by the Hub API).sounds like we're not exactly in agreement in what to name those fns and options but that's software
we can provide these individual focused request methods for external use; but the primary use case would be the single
getContent(..)
I agree, but that's the higher-order, and harder use case. I suggest we start by building the stand-alone fns themselves. That will help inform us on what the getContent()
option should look like.
Specifically, I suggest we start w/ #353, which is not specific to datasets, but should help illuminate the pattern. Then we can build into getCapabilities()
. While I'm working on that @JoshCrozier can start to come at it from the statistics side of things.
In #399 I set up the infrastructure to make additional fetches for content properties and gracefully capture any errors. We can use that for the additional properties needed for datasets.
We'll use #401 to flesh out:
We'll use #538 to flesh out:
I've scoped this issue down to focus on returning what is needed for a feature layer. We can add additional issues for the other types of content that hub considers a dataset.
We should reference the functions outlined https://devtopia.esri.com/dc/hub/issues/1187 to understand what composer is doing server-side.
Define
IHubDataset
that extendsIHubContent
w properties likerecordCount
,fields
,geometryType
,layer
?,server
?, etc, as needed by the Hub app, and havegetContent()
return that when the content'shubType
isdataset
.