gchq / event-logging-schema

Event Logging is an XML Schema for describing the auditable events generated by computer systems, hardware devices and access control systems
Apache License 2.0
25 stars 6 forks source link

gh-10 Improve View SearchResult #11

Closed at055612 closed 5 years ago

at055612 commented 7 years ago

Relates to #10

This PR is to discuss options for and implement improvements to the modelling of a user viewing search results where the viewing is done as a separate action to the search itself, e.g. where the search results have been saved for viewing at a later date.

Currently SearchResult and its SearchResultComplexType exists within the BaseMultiObjectCompexType. SearchResultComplexType includes a Relevancy element indicating that SearchResult represents a single item in a search. Possible uses may be: Search/Results/SearchResult(1..*) View/SearchResult View/Criteria/Results/SearchResult(1..*) In all of the above cases, SearchResult could be replace with any of the other MutiObject types, e.g. File, Document etc. if that better describes what is happening.

Option 1 - Change nothing, use View/Criteria If the original search criteria is known then this path could be used as it allows the recording of both the original query and full details of the results.

Option 2 - Change the SearchResultComplexType Add a 0..1 element of type MultiObjectComplexType so that the SearchResult can include detail of the search result item, e.g. file document etc.

Option 3 - Add a new element to the MultiObjectComplexType called something like SearchResults which includes elements like ResultPage, TotalResults, Results as is done in Criteria and Search

Discuss....

quadhat commented 7 years ago

Referring back to #10, this is something we see a lot, and is causing some confusion as where best to record results. Also it gives us two search scenarios to handle in rules, ones that return results immediately and ones that are delayed/asynchronous.

Might it be cleaner to remove the Search/Result elements, and merge these logically into a more descriptive View/SearchResults element? This way there is always one model for all types of Search, even if this does also mean it's also always 2 events (which seems to be the common case these days anyway).

at055612 commented 7 years ago

I am very keen to avoid any breaking changes to the schema (unless totally unavoidable) as this means having to migrate stroom translations and breaking code changes to the event-logging java jaxb api which has knock on effects to those that use that.

burnalting commented 7 years ago

I would echo @at055612's concern about translation migration re the deprecation of Search/Result ... there is a lot of extant capability/translations that process a 'search+resultset' activity into one event. That said, I would like to see the experiment of a single event ingest that generates two (or more) events in a pipeline.

I currently use

My two cents would be option 2 - extend SearchResultComplexType

As @at055612 says, this is 'additive' and is unlikely to break extant capability.

quadhat commented 7 years ago

Makes sense :) Obviously if we can do it cleanly with what's already there than we're happy, and we can reduce confusion by describing it in our guidance to projects.

@burnalting I hadn't spotted the View/Criteria/Results, so we'll have a look and see if we can get a consensus to use it.

@at055612 Clearly non-breaking is good, but am I opening a can-of-worms to ask if there is any appetite to 'clean' the schema in a new major version, maybe after analysing the many repositories in use to see what's most effective? I'm sure that would be an interesting challenge! ;)

stroomdev66 commented 7 years ago

In an ideal world any event that gives you search results should also tell you what the corresponding query/criteria was as the query indicates the users intent.

Currently View/Criteria/Results is probably the best way to indicate viewing stored results. It also gives us a chance of echoing the associated query if they are persisted with the results by the source system.

In the common situation where search requests and search results are delivered as separate events, it would be great if source systems could provide an activity id for each so that we had a chance of associating the two. In practice I don't think this has ever been done but correct me if I'm wrong. Again it is still desirable to get the query repeated back again in search result events to remove the need to marry the request and result in post analysis.

In the above comments it strikes me that perhaps the use of the word Criteria is perhaps problematic and the term Query might be preferable. The word Criteria was used as it is a broader term to encompass a system perhaps filtering data rather than searching an index. Do we care about whether a system filters or searches though if the outcome is similar? This should be discussed further as perhaps all of these sorts of activities should be deemed search + query to simplify.

Systems might provide a user with the means to view stored 'Query' and stored 'Search Results', and although a query might have associated search results and search results have an associated query, I think it might be important to allow for the distinction in the schema, i.e. View/Query or View/SearchResults. If this distinction were allowed we'd need to ensure the types didn't allow for cyclic references, e.g. View/Query/SearchResults/Query...

burnalting commented 7 years ago

I my use cases I make heavy use of EventDetail/TypeId and EventDetail/Description to provide clarity.

So using the two use cases in the last paragraph above I'd have

EventDetail/TypeId = 'View Query Terms' EventDetail/Description = 'The user has viewed a set of search terms' EventDetail/View/Criteria/Name - the name associated with the search terms EventDetail/View/Criteria/{DataSources,Query} - for the search terms

and

EventDetail/TypeId = 'View Search Result Set' EventDetail/Description = 'The user has viewed the result set returned from a search' EventDetail/View/Critera/Id - the unique identifier of the stored result set EventDetail/View/Criteria/{DataSources,Query} - for the search details if we have them EventDetail/View/Critera/{ResultPage,TotalResults,Results} - to show what the user has seen EventDetail/View/Critera/Data[\@Name='QueryName']/\@Value - to hold the query name if we had it ... basically we are doing a link back the the saved query.

I suppose this might mean we may find it useful to add an Id, Name and Description to the QueryComplexType to cut down overloading Data elements.

And of course, if the 'Perform a search in the background' event returns an Id to link to the 'View results sets' action then we would have

EventDetail/Search/Data[@Name='ResultSetId']/@Value - the offered id to match EventDetail/View/Critera/Id

stroomdev66 commented 7 years ago

Yes it makes sense to add Id, Name and Description to QueryComplexType and Id could perhaps be used to link search request and response where possible.

As your last comment points out Query is a sub element of Criteria so ignore my Query vs. Criteria debate.

Do you think SearchResults should be available as a View item in addition to Criteria for the case where a user views results and not a stored Criteria?

burnalting commented 7 years ago

One could create a new SearchResults element and hence not overload the Criteria/Results sub-element so long as there is the ability to link a View/SearchResults to Search element.

stroomdev66 commented 5 years ago

To refresh the debate on this issue, I do think that it makes sense to allow the inclusion of search results in the View action and to deprecate search results being part of the Search action. People get confused with whether or not they need to provide the results during a search and often find it hard to do so. This would remove confusion and provide a single way of registering that a user has viewed some results even if it does mean supplying two events for a search in some cases. I realise that this could lead to a breaking change but I think breaking changes are easier to live with than constant confusion.

burnalting commented 5 years ago

To keep this simple and non breaking we could

  1. Keep Search the same to not break things. I'd imagine there is a lot extant usage of Search/Results. Via documentation we can suggest it's deprecation or have it cater for simple search and immediate view activities (it's current usage).

  2. Modify QueryComplexType to add Id, Name and Description so our view of a result set can reference a query applied in a Search action. Then instead of identifying the viewed result set using View/Criteria/{ResultPage,TotalResults,Results} create a new SearchResultsComplexType based onCriteriaComplexType without the DataSources sub-element and then use View/SearchResults to model the viewing of a result set.

Thus a flow for a search and delayed (or multiple users viewing the same result set) activity could be

Time X: EventDetail/TypeId - 'SearchandStore' EventDetail/Description - 'Perform a search and store the results for later review' EventDetail/Search/DataSources/... EventDetail/Search/Query/Id - the id EventDetail/Search/Query/Name - the query name EventDetail/Search/Query/Description - the query description EventDetail/Search/Query/{Advanced,Simple,Raw} - the terms EventDetail/...

Time X+n: EventDetail/TypeId - 'ViewStoredResultSets' EventDetail/Description - 'View a resultset from a previous search' EventDetail/View/SearchResults/Query/{some or all of the related Search/Query elements} EventDetail/View/SearchResults/{ResultPage,TotalTResults,Results} EventDetail/....

burnalting commented 5 years ago

We have nothing, in terms of a changed xsd file, to review.

Is my last comment appropriate and hence we can author some xsd changes?

at055612 commented 5 years ago

Will craft a concrete change for people to look at.

at055612 commented 5 years ago

Added some changes. Not convinced we need the Id/Name/Description on QueryComplexType as they are already on the SearchComplexType by virtue of the BaseObjectGroup.

burnalting commented 5 years ago

The idea behind {Id,Name,Description} in QueryComplexType:

When viewing a Result Set, the View activity itself may have it's own identifier, name, description, etc. Thus, we need some linkage back to the original Search event.

at055612 commented 5 years ago

Split out some of the xml examples, which I probably should have done down on master, hence the number of changes. Never mind.

at055612 commented 5 years ago

Rewording the annotation on Search/Results to

Describes the results returned by the search if they are know at the time of the Search event (a synchronous search). If the results are not know at execution time (an asynchronous search) and will be viewed as part of a separate event then View/SearchResults can be used to model that event.