eu-nebulous / monitoring

Mozilla Public License 2.0
0 stars 0 forks source link

EMS not reporting metrics if they are not used in a SLO constraint #11

Open robert-sanfeliu opened 1 week ago

robert-sanfeliu commented 1 week ago

Assume this partial metric model

{
  "name": "NumWorkers",
  "type": "raw",
  "sensor": {
    "type": "-",
    "config": {}
  },
  "output": "all 15 sec"
},            
{
  "name": "AccumulatedSecondsPendingRequests",
  "type": "raw",
  "sensor": {
    "type": "-",
    "config": {}
  },
  "output": "all 15 sec"
},

With two metrics: NumWorkers and AccumulatedSecondsPendingRequests The metric AccumulatedSecondsPendingRequests is used in the utility function

{
      "name": "f",
      "type": "minimize",
      "expression": {
        "formula": "100 /exp( 10 * ( (AccumulatedSecondsPendingRequests / spec_components_0_traits_0_properties_replicas) - 30)^2 )",
        "variables": [
          {
            "name": "AccumulatedSecondsPendingRequests",
            "value": "AccumulatedSecondsPendingRequests"
          },
          {
            "name": "spec_components_0_traits_0_properties_replicas",
            "value": "spec_components_0_traits_0_properties_replicas"
          }
        ]
      }
    }

Optimiser controller informs about this metric in the metrics list message:

topic://eu.nebulouscloud.optimiser.controller.metric_list
subject:1414020207rest-processor-app1719922442226
properties:{application=1414020207rest-processor-app1719922442226}
correlationId:null
payload:{"metrics":["AccumulatedSecondsPendingRequests"]}

Metric is reported by the app component (as seen on the EMS server logs):

2024-07-02 12:53:25,818 - controller - INFO - AccumulatedSecondsPendingRequests_SENSOR: {"metricValue": 0, "level": 1, "timestamp": 1719924805}
on_send SEND {'type': 'textMessage', 'amq-msg-type': 'text', 'destination': '/topic/AccumulatedSecondsPendingRequests_SENSOR', 'content-length': 55} {"metricValue": 0, "level": 1, "timestamp": 1719924805}

However, the metric is not published by EMS on the NebulOuS message broker and the solver fails to find a new deployment topology because of that.

... failed to forward the application execution context (size: 1, Unset: 1)
AccumulatedSecondsPendingRequests with value null end
Metric Updater: SLO violation received
{
  "predictionTime": 1719924732277,
  "probability": 1.0,
  "severity": 1.0,
  "when": "2024-07-02T12:52:12.277368834Z"
}
... failed to forward the application execution context (size: 1, Unset: 1)

However, other metrics with an SLO defined on them are published just OK by EMS to the NebulOuS message broker:

[INFO ] 2024-07-02 14:49:26.008 [pool-2-thread-2] NebulOuSMessageBrokerListener - 
topic://eu.nebulouscloud.monitoring.realtime.NumWorkers
subject:1414020207rest-processor-app1719922442226
properties:{application=1414020207rest-processor-app1719922442226}
correlationId:null
ipatini commented 5 days ago

@robert-sanfeliu i have pushed an updated image of ems, so please check again when convenient