elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.91k stars 24.73k forks source link

[ML] Datafeeds don't return deprecation warning in deprecation info API and preview #82938

Open hendrikmuhs opened 2 years ago

hendrikmuhs commented 2 years ago

Related to: #82935, #82936

When using painless scripts with deprecations, warnings aren't created and returned. Deprecations should be returned as part of the deprecation info API, especially for 7.17 (nevertheless the fix should also go into main for future deprecations).

Also affects datafeed_config if used for _ml/datafeeds/_preview.

Repro steps (create total-requests 1st, see docs):

PUT _ml/datafeeds/datafeed-2
{
  "job_id": "total-requests",
  "indices": [
    "server-metrics"
  ],
  "runtime_mappings": {
    "timestamp_in_millis": {
      "type": "long",
      "script": """
      emit (doc['timestamp'].value.millis);
      """
    }
  },
  "query": {
    "range": {
      "timestamp_in_millis": {
        "gte": 10
      }
    }
  }
}

Decision: Returning deprecation warnings as part of GET is out of scope. Target the deprecation info API 1st, 2nd warnings in the preview API, at least if the data feed is given inline.

elasticmachine commented 2 years ago

Pinging @elastic/ml-core (Team:ML)

edsavage commented 2 years ago

Checking for deprecation warnings, stemming from Painless scripts, requires that a search request be performed. Depending on the nature of the query and the datafeed this can take a non-trivial amount of time. For this reason it has been decided to not add the checks as part of the deprecation info API as this could result in the upgrade process failing. This issue therefore will focus on surfacing Painless script deprecation warning headers in the datafeed preview API.

edsavage commented 2 years ago

A solution to the problem of surfacing Painless script deprecation warnings has not yet been found. The following notes describe my understanding of the issue so far and may help clarify the nature of the problem. (All code and line numbers are with respect to 7.17.4-SNAPSHOT)

The following API calls reproduce the issue.

POST /_license/start_trial?acknowledge=true

PUT /my-index-000001
{
  "mappings":{
    "properties": {
      "@timestamp": { "type": "date" },
      "aborted_count": { "type": "long" },
      "another_field": { "type": "keyword" }, 
      "clientip": { "type": "keyword" },
      "coords": {
        "properties": {
          "lat": { "type": "keyword" },
          "lon": { "type": "keyword" }
        }
      },
      "error_count": { "type": "long" },
      "query": { "type": "keyword" },
      "some_field": { "type": "keyword" },
      "tokenstring1":{ "type":"keyword" },
      "tokenstring2":{ "type":"keyword" },
      "tokenstring3":{ "type":"keyword" }
    }
  }
}

PUT /my-index-000001/_doc/1
{
  "@timestamp":"2017-03-23T13:00:00",
  "error_count":36320,
  "aborted_count":4156,
  "some_field":"JOE",
  "another_field":"SMITH  ",
  "tokenstring1":"foo-bar-baz",
  "tokenstring2":"foo bar baz",
  "tokenstring3":"foo-bar-19",
  "query":"www.ml.elastic.co",
  "clientip":"123.456.78.900",
  "coords": {
    "lat" : 41.44,
    "lon":90.5
  }
}

PUT _ml/anomaly_detectors/test
{
  "analysis_config": {
    "bucket_span": "10m",
    "detectors": [
      {
        "function": "mean",
        "field_name": "hour"
      }
    ],
    "summary_count_field_name": "doc_count"
  },
  "data_description": {
    "time_field": "@timestamp"
  },
  "datafeed_config": {
    "datafeed_id": "datafeed-test",
    "indices": [
      "my-index-000001"
    ],
    "runtime_mappings": {
      "hour": {
        "type": "long",
        "script": {
          "source": "emit(doc['@timestamp'].value.hourOfDay)"
        }
      }
    },
    "aggregations": {
      "buckets": {
        "date_histogram": {
          "field": "@timestamp",
          "interval": "600s"
        },
        "aggregations": {
          "@timestamp": {
            "max": {
              "field": "@timestamp"
            }
          },
          "some_field": {
            "terms": {
              "field": "some_field",
              "size": 100
            },
            "aggregations": {
              "responsetime": {
                "avg": {
                  "field": "hour"
                }
              }
            }
          }
        }
      }
    }
  }
}

GET _ml/datafeeds/datafeed-test/_preview

The datafeed config contains a number of different deprecated calls

Both of the deprecations are recorded in the deprecation log file, but only the interval deprecation warning is present in the http response headers.

The ML datafeed extractor called as part of the preview API performs search queries using (ClientHelper ~ line 174)

public static <T extends ActionResponse> T executeWithHeaders(
  Map<String, String> headers,
  String origin,
  Client client,
  Supplier<T> supplier
)

this may be relevant as executeWithHeaders is primarily called from ML datafeed and dataframe code.

The joda deprecation warnings can be seen being added to a thread context created by (TransportService ~ line 952)

Supplier<ThreadContext.StoredContext> storedContextSupplier = threadPool.getThreadContext().newRestorableContext(true); // "true" should result in the headers being preserved
ContextRestoreResponseHandler<T> responseHandler = new ContextRestoreResponseHandler<>(storedContextSupplier, handler);

However the thread context containing the response headers is only accessed in one place, where the return value is allowed to go out of scope, in the function (TransportService.java ~ line 1468)

TransportService::ContextRestoreResponseHandler::handleResponse(T response) {
  if (handler != null) {
    handler.cancel();
  }
  try (ThreadContext.StoredContext ignore = contextSupplier.get()) { // XXX 'ignore' has the thread context containing the response headers needed
    delegate.handleResponse(response);
  }
}

So the outstanding question is: how to retrieve the desired headers from the thread context?