gchq / stroom

Stroom is a highly scalable data storage, processing and analysis platform.
https://gchq.github.io/stroom-docs/
Apache License 2.0
424 stars 58 forks source link

Model stream deletion events as a `Delete` activity #4146

Open p-kimberley opened 4 months ago

p-kimberley commented 4 months ago

Version: 7.3

A stream deletion event is currently output like:

<EventDetail>
  <TypeId>MetaResourceImpl.updateStatus</TypeId>
  <Description>Modify the status of selected streams to Deleted</Description>
  <Search>
    <Query>
      <Advanced>
        <And>
          <Term>
            <Name>Id</Name>
            <Condition>Equals</Condition>
            <Value>2991067</Value>
          </Term>
        </And>
      </Advanced>
    </Query>
  </Search>
</EventDetail>

This is missing two key pieces of information:

Additionally, a deletion event should be modelled as a Delete activity, for ease of discovery/search.

at055612 commented 4 months ago

Results of changes are as follows:

Delete of single stream.

<EventDetail>
  <TypeId>MetaResourceImpl.updateStatus</TypeId>
  <Description>Delete stream with ID 920</Description>
  <Delete>
    <Object>
      <Id>920</Id>
      <State>Unlocked</State>
    </Object>
    <Data Name="Feed" Value="DATA_SPLITTER-EVENTS"/>
    <Data Name="Type" Value="Error"/>
  </Delete>
</EventDetail>

Restore single stream

<EventDetail>
  <TypeId>MetaResourceImpl.updateStatus</TypeId>
  <Description>Update the status of stream with ID 1999 from Deleted to Unlocked</Description>
  <Update>
    <Before>
      <Object>
        <Id>1999</Id>
        <State>Deleted</State>
      </Object>
    </Before>
    <After>
      <Object>
        <Id>1999</Id>
        <State>Unlocked</State>
      </Object>
    </After>
    <Data Name="Feed" Value="DATA_SPLITTER-EVENTS"/>
    <Data Name="Type" Value="Error"/>
  </Update>
</EventDetail>

Delete N checked streams

<EventDetail>
  <TypeId>MetaResourceImpl.updateStatus</TypeId>
  <Description>Delete 3 streams</Description>
  <Delete>
    <Criteria>
      <Query>
        <Simple>
          <Include>1985,1976,1975</Include>
        </Simple>
      </Query>
    </Criteria>
    <Data Name="Count" Value="3"/>
    <Data Name="FeedCount" Value="1"/>
    <Data Name="TypeCount" Value="2"/>
    <Data Name="ProcessorCount" Value="1"/>
    <Data Name="PipelineCount" Value="1"/>
    <Data Name="StatusCount" Value="1"/>
    <Data Name="MinCreateTime" Value="2024-02-08T15:17:25.788Z"/>
    <Data Name="MaxCreateTime" Value="2024-02-08T15:17:26.605Z"/>
  </Delete>
</EventDetail>

Delete using selectAll & filter

<EventDetail>
  <TypeId>MetaResourceImpl.updateStatus</TypeId>
  <Description>Delete streams matching a criteria</Description>
  <Delete>
    <Criteria>
      <Query>
        <Advanced>
          <And>
            <Term>
              <Name>Status</Name>
              <Condition>Equals</Condition>
              <Value>Unlocked</Value>
            </Term>
            <Term>
              <Name>Type</Name>
              <Condition>Equals</Condition>
              <Value>Error</Value>
            </Term>
            <Term>
              <Name>Create Time</Name>
              <Condition>GreaterThan</Condition>
              <Value>2024-02-22T00:00:00.000Z</Value>
            </Term>
          </And>
        </Advanced>
      </Query>
    </Criteria>
    <Data Name="Count" Value="5"/>
    <Data Name="FeedCount" Value="1"/>
    <Data Name="TypeCount" Value="1"/>
    <Data Name="ProcessorCount" Value="1"/>
    <Data Name="PipelineCount" Value="1"/>
    <Data Name="StatusCount" Value="1"/>
    <Data Name="MinCreateTime" Value="2024-02-29T14:06:21.338Z"/>
    <Data Name="MaxCreateTime" Value="2024-03-01T12:10:26.397Z"/>
  </Delete>
</EventDetail>

Restore using selectAll & filter

<EventDetail>
  <TypeId>MetaResourceImpl.updateStatus</TypeId>
  <Description>Update the status of streams matching a criteria from Deleted to Unlocked</Description>
  <Update>
    <After>
      <Criteria>
        <Query>
          <Advanced>
            <And>
              <Term>
                <Name>Status</Name>
                <Condition>Equals</Condition>
                <Value>Deleted</Value>
              </Term>
              <Term>
                <Name>Create Time</Name>
                <Condition>GreaterThan</Condition>
                <Value>2024-02-22T00:00:00.000Z</Value>
              </Term>
            </And>
          </Advanced>
        </Query>
      </Criteria>
    </After>
    <Data Name="Count" Value="6"/>
    <Data Name="FeedCount" Value="2"/>
    <Data Name="TypeCount" Value="2"/>
    <Data Name="ProcessorCount" Value="2"/>
    <Data Name="PipelineCount" Value="2"/>
    <Data Name="StatusCount" Value="1"/>
    <Data Name="MinCreateTime" Value="2024-02-28T09:34:36.040Z"/>
    <Data Name="MaxCreateTime" Value="2024-03-01T12:10:26.397Z"/>
  </Update>
</EventDetail>
at055612 commented 3 months ago

Waiting on @p-kimberley to confirm he is happy with the event structure generated.

at055612 commented 3 months ago

From @p-kimberley :

I still think it's important to include somehow, information about feeds affected, where multiple streams are selected. Perhaps make Data[@Name='Feed'] one-to-many, including a list of unique feeds affected by the deletion. This would be in lieu of including details of every single stream in the event itself, which could be numerous. I view the deletion of multiple streams via checkbox selection as a reasonably likely scenario and once the stream's deleted and later purged - it's gone. So adding the distinct feed names and types would at least enable a reviewer to gain a rough idea of the type of activity performed. For instance, was it the deletion of a bunch of Error streams to clean up after a bad processing run? Or deletion of Raw Event streams for an important feed?

at055612 commented 3 months ago

Changed the data items to include the distinct feeds/types/statuses.

<EventDetail>
  <TypeId>MetaResourceImpl.updateStatus</TypeId>
  <Description>Delete 2 streams</Description>
  <Delete>
    <Criteria>
      <Query>
        <Simple>
          <Include>2029,1923</Include>
        </Simple>
      </Query>
    </Criteria>
    <Data Name="Count" Value="2"/>
    <Data Name="FeedCount" Value="2"/>
    <Data Name="TypeCount" Value="1"/>
    <Data Name="ProcessorCount" Value="2"/>
    <Data Name="PipelineCount" Value="2"/>
    <Data Name="StatusCount" Value="1"/>
    <Data Name="Feeds">
      <Data Name="Feed" Value="TEST_REFERENCE_DATA-EVENTS"/>
      <Data Name="Feed" Value="ZIP_TEST-DATA_SPLITTER-EVENTS"/>
    </Data>
    <Data Name="Types">
      <Data Name="Type" Value="Error"/>
    </Data>
    <Data Name="Statuses">
      <Data Name="Status" Value="Unlocked"/>
    </Data>
    <Data Name="MinCreateTime" Value="2023-12-29T11:09:09.485Z"/>
    <Data Name="MaxCreateTime" Value="2024-02-29T17:48:51.692Z"/>
  </Delete>
</EventDetail>

Only shows up to 20 distinct values. If truncated you will see

    <Data Name="FeedCount" Value="55"/>
    <!-- ... -->
    <Data Name="Feeds">
      <Data Name="Feed" Value="TEST_REFERENCE_DATA-EVENTS"/>
      <!-- ...18 data items... -->
      <Data Name="Feed" Value="ZIP_TEST-DATA_SPLITTER-EVENTS"/>
      <Data Name="IsListTruncated" Value="true"/>
    </Data>

I have also changed the selection summary popup to show the distinct types

image