Search collections impacting content categorization and UX

brollison commented 4 years ago

It has come to my attention this search collections file, which feeds into this categories file, is now impacting...

What content is supported on what content views
Custom dimensions in telemetry insights
How content may be referred to within the application

I'll break these out into separate points to illustrate the disconnect.

1. Supported Content on Views

It was recently discovered that Microsoft Excel documents are now being classified as a dataset, thus when attempting to visit /documents/:id (example) for a type: Microsoft Excel item the user is redirected to /data/:id/explore (result). While on the surface an Excel document may seem similar to a type: CSV item, the data cannot be extracted (at least to my knowledge) and visualized on a table; thus, as a team we defined Excel items as documents.

In fact, quite a few of the items listed in the collection file do not match the collaborative classification we defined. A few other examples...

type: CityEngine Web Scene being classified as a map when in reality all parties chose apps (if embeddable, which it is)
type: Document Link as a document where we chose content (generic fallback) after quite a bit back-and-forth

I'm generally concerned about forcing a 1-1 match between search collections an in-app categorization of content. Should/could categorization feed into collections? Sure, but it likely isn't always desired.

2. Impact on Telemetry

We have a product priority to better understand how and what content is being interacted with within the application. The purposes of these insights range from functionality efficacy / roadmap to customer-facing value of "how are xyz doing on my site?". The telemetry.js model allow us to specify 3 global fields...

category: Engagement
action: Favorite
label: Content-Hero

...however, we can only get more granular details via custom dimensions. We are targeting the following...

id: <item.id>
type: StoryMap
content: Apps

...but leveraging Hub.Type actually returns a generic collection name. In the case of a Hub having multiple "Feedback" types (i.e. Survey, Quick Capture, Forum, etc.) that would look like the following...

id: <item.id>
type: Form
content: Feedback

id: <item.id>
type: Quick Capture
content: Feedback

id: <item.id>
type: Forum
content: Feedback

...which allows us to understand interactions within a broader content category and localized item types.

3. UX with Content in the Application

Carried forward from 2 above, if we use Hub.Type to refer to content with the application, then a survey, forum, and quick capture would all be denoted as Feedback. Not the best, but not a dealbreaker...where it does get odd is around items types whose names are actually more recognizable and descriptive that the broader collection/category. If I'm looking for a PDF of City of Redlands Budget 2020, then using this "source of truth" both the actual PDF and Microsoft Excel budget spreadsheet will be labeled with Document...detracting from the UX and decreasing the usability of the application.

However, we cannot simply refer to the item.type in all cases as that may also detract from the user experience as people wouldn't recognize certain item types...

type: Form >> is that a "survey"?
type: Image Service >> is that a collection of "images"?
type: Web Scene >> ...I have no idea

...where transforming these "jargon" terms to a more approachable format would be more successful..

type: Form >> Survey
type: Image Service >> Map
type: Web Scene >> 3D Map

Wrap-Up

Current "source of truth" is at opposition to what we've already defined as the categories of content which may be displayed in Hub, how do we realign here?
Hub.Type output may oversimplify our ability to understand what is being interacted with in the application, negatively impacting...
1. Hub team's ability to generate insights around the usability of content in the application
2. Customer's ability to generate insights around the efficacy of how content is being used in the application
3. User's ability to confidently use the application
Could we have a Hub.Type and Hub.Content/Hub.Category designation which does the following...
1. Hub.Type will output an approachable string related to a specific item type, such as type: Form becomes Survey
2. Hub.Content will output a higher level categorization of things, similar to what type does today but aligned to mean 1 above

brollison commented 4 years ago

@tomwayson I wanted to follow-up on this issue you asked I open last month. Happy to chat it over or continue an asynchronous conversation :-)

tomwayson commented 4 years ago

pinging myself re: city engine web scene, need to fix that sooner than later.

tomwayson commented 4 years ago

I looked into moving city engine web scenes, but that would break existing tests that expect those to be included in map search results. @drspacemanphd I want to get you looped into this conversation. Is it OK to change the collection types like this? See @brollison's concern above:

I'm generally concerned about forcing a 1-1 match between search collections an in-app categorization of content. Should/could categorization feed into collections? Sure, but it likely isn't always desired.

If not, I will create a getHubType() which calls getCollection() but introduces the changes we need.

brollison commented 4 years ago

This was the comment @tomwayson referenced in today's weekly content views stand-up. The ask is to review the issue above and decide a path forward for resolution -- this may be asynchronous conversation here on the issue, a high-bandwidth discussion, or a kick-off discussion followed by async conversation. After reading the issue, please comment with your initial thoughts and preference on next steps.

STAKEHOLDERS

TELEMETRY - @cpgruber, @brollison
SEARCH - @thomas-hervey, @drspacemanphd
HUB.JS - @tomwayson

I believe everyone but the search folk have context on this issue.

thomas-hervey commented 4 years ago

@brollison @tomwayson after review I understand the disconnect but I'm not sure what the best approach is. I'm not familiar with all the misalignment examples. For point #1 can we make small reclassifications on a per-item basis where needed?

My recommendation would be that we gather a list of all the misaligned item types and then hold an hour meeting to review the list and see how it aligns or differs with our declared classification of supported item types. Then we can see how that affects point #2 and point #3 (which to me are a follow-up concerns).

brollison commented 4 years ago

@tomwayson thanks for adding Urban Model item types and changing CityEngine Web Scene to "Apps" from "Maps" after reviewing with @drspacemanphd. Given @thomas-hervey's comment I'll go ahead and schedule a conversation for later this week to talk through the 3 primary concerns I outlined in the issue body.

tomwayson commented 4 years ago

@MarvinPerry - @esri/hub-common@6.11.2 should fix the issues for the app route. Just tag me on a PR to -ui that bumps to that version and remove these lines:

https://github.com/ArcGIS/opendata-ui/blob/6e1360705fb9a02e81b635811fe812e544d07bc6/packages/opendata-ui/app/apps/app/route.js#L17-L22

brollison commented 4 years ago

Earlier this month, the following people met to further discuss the current impact and next steps of content categorization and search collections being tightly coupled in the application:

TELEMETRY - @cpgruber
PRODUCT - @brollison @thomas-hervey
HUB.JS - @tomwayson
SEARCH - @drspacemanphd

An outcome of this meeting was a clearer definition of the content classification domain model. I've taken a stab at that below...

Content Classification https://lucid.app/invitations/accept/cd48dd8f-a746-4389-b275-21c5b1a986d4

COLLECTION - configurable grouping of content categories and/or types meant to facilitate information retrieval via search, example...

Paige is standing up a "Community Stories" site where she is showcasing events, web maps, web experiences, and StoryMaps to her target audience encouraging them to attend an event and create their own StoryMap. She would like 3 search collections...

Events - includes the entire Events content category

Maps - she only wants to show web maps and web experiences shared to her content library, so she only includes...

2D Map type from the Maps category

Experience type from the Apps category

Stories - only includes StoryMap from the Apps category

CATEGORY - non-configurable grouping of content types meant to organize types of content into logical groupings to facilitate manager search, content configuration, and application architecture (used here on "Supported Item" > "Category" tab for content views), example...

Feedback content category may include the following types: Survey, Discussion, etc.

Events content category may only include the Event type

Data content category may include the following types: Dataset, Table, etc.

Documents content category may include the following types: Microsoft Excel, PDF, etc.

CONTENT TYPE - non-configurable translation of item.type into a more approachable and understandable designation if necessary (preference to not translate brand names such as "Microsoft Excel"), examples...

item.type: "Form" becomes Survey

item.type: "Hub Site Application" becomes Website

item.type: "Web Map" becomes 2D Map

item.type: "Web Scene" becomes 3D Map

item.type: "Document Link" becomes Link

item.type: "PDF" stays PDF

item.type: "StoryMap" stays StoryMap

item.type: "Web Mapping Application"; typeKeywords: "StoryMap" becomes StoryMap

PLATFORM TYPE - non-configurable literal item.type value supplied by the platform

ajturner commented 4 years ago

@brollison providing a comprehensive view of this work is imperative. Thank you for gathering and outlining this approach.

I'm concerned about the specific re-use of Category since that is a reserved term already used in ArcGIS for a taxonomic label applied to Items. Since we are extending our domain model by introducing a new concept for "meta-type" we should try to use a new unique word to describe this concept. This will prevent future confusion in requirements definition, information design, development, and documentation.

Options for this word could play on how scientific names work - where Type == Species so these higher-order sets could be Content Family or Content Classification.

To contextualize in your good scenarios

Events - includes the entire Events content family
Maps - she only wants to show web maps and web experiences shared to her content library, so she only includes...
- 2D Map type from the Maps family
- Experience type from the Apps family
Stories - only includes StoryMap from the Apps family

ajturner commented 4 years ago

Also, in your diagram you state Site has many Collections. Our current domain language here is

Site has a Catalog
Catalog has many Collections

A Catalog is similar to a Collection, in that it defines the Groups (today) or Tags/Types/Orgs/Users (future) for the entire Site catalog (conceptually the All collection). The Collections are then subsets of the catalog.

Perhaps we could say there is a Search definition (technical): Catalogs and Collections both have a Search definition

brollison commented 4 years ago

@ajturner thank you for the feedback, I fully agree and have updated the proposed model.

I've changed Category to Class, truncated "Classification", which I don't believe is used in the platform today; however, it is obviously a regular dev and domain model term which may still present confusion.

In an effort to use entirely unique terms, I decided Type (platform's item.type) and Type (Hub's translation) should be unique; therefore, I've moved Hub's Type to Label which I don't believe it regularly used outside of UI design references.

Obviously the goal with these names is to ensure clarity in design, development, and communication amongst the team; therefore, continual feedback and ideation on a final taxonomy may be warranted.

Content Classification-4

This brings our current definitions to...

CATALOG Configureable (must exist) grouping of collections leveraged in a site's search.

COLLECTION Configurable grouping of content classes and/or labels meant leveraged by a site's catalog for search.

CLASS Non-configurable grouping of content types meant to organize types of content into logical groupings to facilitate manager search, content configuration, and application architecture.

LABEL Non-configurable translation of item.type into a more approachable and understandable designation if necessary (preference to not translate brand names such as "Microsoft Excel").

NOTE I did not include anything referencing a Search definition as I'm unfamiliar with it in this context, as you would imagine searching "search definition" didn't turn out productive on Google.

thomas-hervey commented 4 years ago

@brollison thanks for this. I think your diagram makes sense. I'd like to see an example of each, so I added a few that exist today. Please correct them if they're wrong.

ajturner commented 4 years ago

@brollison I don't know if we should require a Site has a catalog, or a catalog has collections.

A Site may have a catalog which defines its Search definition scope. If a catalog is not provided then the Site catalog is implied to be (Site owner's Org's content?)

A Catalog may have collections to provide additional focused Search definition scopes. The catalog definition is used for "All".

Technically this looks like

Site :: Item 
  title: string
  catalog?
    definition: SearchDefinition
      groups?: Array<string>
      tags?: Array<string>
      categories?: Array<string>
      orgs?: Array<string>
      owners?: Array<string>
      query?: Array<string>
    collections?:Array<..>
      title: string
      definition: SearchDefinition...

To be clear, how they are used:

would result in a unioned set of search definitions:

Site(DC).Collection(Data).Search("Water") => Portal/Hub search?q=water AND (group:<dchealthId> OR group:<dcTranspoId>...) AND (type:"Feature Layer" OR type:"Feature Collection..)
Site(DC).Search("Water") => Portal/Hub search?q=water AND (group:<dchealthId> OR group:<dcTranspoId>...)

where the Site is configured as:

Site
  title: "DC"
  catalog: {
    definition: {
      groups: [<dchealthId>, <dcTranspoId>]
    },
    collections: [{
      title: "Data", 
      definition: {
        types: ["Feature Layer", "Feature Collection"]
      }
    },{
      title: "Apps", 
      definition: {
        types: ["Web Mapping Application", "StoryMap"]
      }
    },...
  }

brollison commented 4 years ago

@ajturner making a catalog optional and falling back to the org's content is interesting, I would presume we already do this via v3API at org-short.hub.arcgis.com/search -- is that true @thomas-hervey ? I believe I misread your earlier statement in that "All" was conceptually a collection, thus requiring an "has at least one" incorrectly.

Would it be correct to say...

Organizations must have a catalog and search definition
Sites may have a catalog and search definition, if not they will use that of the parent organization
Sites may have many collections with their own search definitions, if not then only the catalog is leveraged (i.e. "All" collection) whether site or organization

I can see use-cases for the "implied" catalog being the organization or the user; however, I would lean toward making the organization the default as it seems to fit most current and future expected experiences.

I also more clearly understand Search Definition in our context, thank you for the examples.

thomas-hervey commented 4 years ago

currently on /api/v3/datasets/ there's a collections filter but other than that I don't know of a way to specify a collection when searching. /search does not look for a catalog and fall back if there isn't one. I guess you could say we fall back to the org's content, but that's because catalogs aren't a configuration in our app yet, from what I understand.

I'll take a stab at your statements @brollison, 1. yes but these are implicitly defined 2. yes 3. yes, but we should provide defaults as we do currently

I think Brian's original reason for starting this discussion came from incongruence between the definition of collections used in search vs other parts of the application. I'm a bit worried that catalogs may be adding unnecessary complexity to the current problem. @ajturner we should sync up and discuss searchDefinitions and catalogs in more detail. I'd like to fit searchDefinition into Brian's model.

ajturner commented 4 years ago

The V3 API does not currently limit the Search based on the domain - https://org-short.hub.arcgis.com/api/v3/datasets. The Site UI does, but if you watch the network requests it calls this API with the "definition" of group ID, augmented by collection parameters as selected. I agree that we should consider that the API does use the domain and path to automatically apply configuration parameters to simplify developer experience.

To your other statements:

Organizations must have a catalog and search definition

No. The organization does limit the set of content from all of AGO but I wouldn't consider this a catalog or search definition. It's just saying "show all content from this Org (and community org?). Here is the UI mock from Klara

Sites may have a catalog and search definition, if not they will use that of the parent organization

correct, which is consistent with the above statement.

Here are Klara's mocks for teaching users

Sites may have many collections with their own search definitions, if not then only the catalog is leveraged (i.e. "All" collection) whether site or organization

Yes - but to be clear. All uses the Site catalog (defined or assumed Org). if collections are configured they can optionally be used (either via modal select in UI tabs or facets or other UI experiences like a tiered browse)

brollison commented 3 years ago

@ajturner thank you for the additional details here - I think there should be a continual focused discussion around search; however, I'd like to cap it for the purposes of moving content classification forward as it is currently impacting content views.

I've proposed the following definitions (including "Collections" to show differentiation)...

COLLECTION Configurable grouping of content classes and/or labels (in the future tags, item categories, etc. as well) to define search.

CLASS Non-configurable grouping of item types meant to organize them into logical groupings to facilitate manager search, content configuration, and application architecture.

LABEL Non-configurable translation of item.type into a more approachable and understandable designation if necessary (preference to not translate brand names such as "Microsoft Excel").

TYPE Non-configurable string value of item.type supplied by the platform.

@tomwayson @ajturner @thomas-hervey are these appropriate definitions to move forward with? If not, please supply alternatives if possible.

@tomwayson I believe there is an outstanding concern the existing getHubType() function already being defined to target what would be considered a "Class" above.

thomas-hervey commented 3 years ago

@brollison these definitions make sense to me. For clarification, what does "manager search" mean under the class definition?

Also, can you or someone else in this thread double check the examples I've added to the model?

brollison commented 3 years ago

@thomas-hervey good question, I was specifically referencing search for content managers in "Edit Mode" such as item pickers and the Gallery Card as two examples.

Your examples are accurate, but perhaps not the most descriptive of the point as a PDF will likely still be a PDF as a label. Descriptive examples may be... CONTENT	EXAMPLE 1	EXAMPLE 2	EXAMPLE 3	EXAMPLE 4
Type	`Microsoft Excel`	`StoryMap`	`Web Scene`	`CSV`
Label	`Microsoft Excel`	`StoryMap`	`3D Map`	`Table`
Class	`Documents`	`Apps`	`Maps`	`Data`

thomas-hervey commented 3 years ago

@brollison those make sense to me and I'm good with the examples. I really appreciate you taking the lead on this. I'm still a bit nervous that label and class are additional complexity, but I understand why we need them based on our current architecture.

Two questions...

If we consider label and class as non-configurable but collections are, does that mean that a customized collection could hold multiple granularities? For example, a collection called "Transportation" would allow the inclusion of Apps Maps and Microsoft Excel?
Can we make sure no labels collide with types? Table in this case would be both a type (because it's an AGO item type) and a label for csv.

I know our time is thin right now, but I think we should have an hour meeting next week to

solidify the definitions in the model, and
expand on the Supported Items sheet (maybe add a new sheet?) to completely map our current "collection" and "category" definitions to your model's classification. I'm going to take a stab at it here.

tomwayson commented 3 years ago

I haven't had a chance to absorb all this yet, but I don't like "Class" as that has special meaning for developers.

brollison commented 3 years ago

@thomas-hervey we somewhat have this "Hub Classification" of item types in the app already; however, it is distributed and updating would required unknown number of files. I would like to get to a point where if we wanted to change how something is referenced in app it is 1 change to 1 file.

To your questions...

Yes, you would be able to configure a collection of many class and/or label designation. I believe @ajturner has also suggested a future expansion of this configuration to facilitate collections of configured parameters as well, in this case a future state may be (unknown if these are AND / OR or a mix)...
- class: Apps & class: Maps
- label: Microsoft Excel
- tags: [transportation, transit]
- categories: infrastructure
This is a good example to be mindful of; however, we should likely consider the user perception of these content items...would they consider a type: Table and type: CSV to be disassociated or simply both Table style content?

Thanks for creating a spreadsheet to start a collaboration. I'll schedule a meeting for next week.

@tomwayson I thought this may be the case and referenced it as a primary concern at the beginning of this comment above. I would be interested in getting your feedback on a possible option without namespace collision.

tomwayson commented 3 years ago

I suggest "Hub Category" (content.hubCategory), feels better than what we're currently using: "Hub Type" (content.hubType). It's clear to me the distinction between that and an item's categories (item.categories).

ajturner commented 3 years ago

Quick diagram illustrating how a

Items belong to one and only one Org
Site has a catalog that can cross multiple Orgs
Collections are subsets of the Site catalog
Collections could overlap items (e.g. an item matches criteria of Collection B and Collection C)

thomas-hervey commented 3 years ago

@ajturner @brollison @tomwayson

The following are suggested (possibly redundant) action items from 2020-11-4 meeting:

From Thomas

(Thomas + Andrew) Determine definition for collections and class (is it just a list of query parameters)
- Highlight definition differences between a search definition on a site & one within a collection
(anyone) ER diagram with examples for each
(Brian) Finalize names for "Class", "Label", "Type" in Brian's model
(everyone) individually go over collection mapping and update

From Brian

(Thomas) Someone to enumerate item pickers in the app and what are the catalogs/collections of them
(Brian) Work through if "label" is 1:1 or 1:n relationship (i.e. web scene and cityengine web scene are "3D Maps)
(Brian) Work on words (i.e. "Classification" or otherwise)

brollison commented 3 years ago

Circling back on my items...

Q: Is the item type to label translation (i.e. "Web Map" > "2D Map") a 1:1 or n:1 relationship? A: Item type to label relationship should be n:1
- It is very likely that many relationships are 1:1, such as particular "brand names"...
- type: Dashboard > label: Dashboard
- type: Microsoft Excel > label: Microsoft Excel
- However, this cannot be enforced as some applications have multiple item types which would map to a single label (i.e. n:1 relationship support required)
- type: Web Mapping Application typeKeyword: story map > label: StoryMap
- type: StoryMap > label: StoryMap
- Other cases of n:1 relationships should be identified and evaluated here
What is the final taxonomy of content categorization in Hub? Concerns around "class" causing dev confusion
- TYPE: item.type, such as type: Web Scene
- LABEL: Hub translation of "type", such as label: 3D Map
- SEGMENT: categorization of content types into related categories, such as segment: Map

NOTE Given the item type to label relationship may be n:1, we must use Type to group content into Segments.

I've updated the domain model here and pasted the new version below.

Content Classification - Copy of Page 1

ajturner commented 3 years ago

@thomas-hervey I deleted your comment because it shared a word document that included auth tokens in a public repository.

I've sent you a private invite to a chart document that might be a better method for documenting these.

ajturner commented 3 years ago

@brollison there are two ways to test the domain language - using the terms in english and also in a functional interface.

Looking at an example of the "App Card" selection

To embed an App in your site, you will find Content of segment 'App' from your Community or other organization

search( query: "", segment: "App", from: "mine", sort: [{property:"title", order: "desc"}, {property: "updated", order: "desc"}])

does that sound right and programmatically make sense?

brollison commented 3 years ago

I would agree it does not, the namespace here is relatively limited given much is already taken (i.e. group, category, etc.) and others being too clinical for the interface (i.e. classification). Perhaps we fallback to biology as you suggest and pull either order or family, with family likely being the more approachable of the two.

To quickly embed a StoryMap on your site, use the App Card which lists available content from the family of Apps in your Community and other organizations.

search( query: "", family: "App", from: "mine", sort: [{property:"title", order: "desc"}, {property: "updated", order: "desc"}])

Family feels better stated this way, thoughts? @ajturner @thomas-hervey

Esri / hub.js