epam / cloud-pipeline

Cloud agnostic genomics analysis, scientific computation and storage platform
https://cloud-pipeline.com
Apache License 2.0
144 stars 58 forks source link

Collecting event logs #684

Open NewOlya opened 4 years ago

NewOlya commented 4 years ago

Background

The Cloud Pipeline platform shall collect logs from all events that occur in the system - to trace back actions for bug fixes or identify incorrect user actions.

Approach

We need to check which logs are currently collected and implement the gathering of "missing" events logs that haven't been collecting yet. All logs shall be collecting automatically.

Events

Events that shall be logged:

  1. Create/update/delete operations (including permissions operations) over the following objects:
    • library folder (including clone/lock operations)
    • pipeline (including unregister/"register existing" operations)
    • detach configuration
    • object storage (including unregister/"register existing" operations)
    • FS storage (including unregister/"register existing" operation)
    • metadata
    • project
    • tool (including instance management)
    • tool group
    • registry
    • object issues
    • object attributes (tags)
  2. Create/update/delete operations over the following system objects:
    • user
    • user group
    • role
    • system events (notifications)
    • email notifications
    • system-level preferences
    • Cloud Regions preferences
  3. Full run history regardless where the run was launched (pipeline, detach configuration, metadata, tool) including the following operations:
    • lifecycle management (launch/rerun, pause, resume, stop, terminate operations, etc. including failures)
    • user's activity within an interactive session (console/SSH session/FS browser)
    • "sharing run" operation
  4. Cluster state changes (nodes lifecycle)
  5. Global search query history
  6. System authorization (login/logout operations)
  7. System exceptions, warnings, errors, failures

Log information

Each event record shall contain at least the following info:

NShaforostov commented 4 years ago

@sidoruka, review the issue, please.

sidoruka commented 4 years ago

This issue is mostly considered as a way to simplify the support/troubleshooting. One of the key problems we are facing here is the investigation of the user's operations within a specific job. E.g. which commands were executed and any log output. I'd consider the following tasks with the highest prio:

It shall be designed/proposed/implemented: