elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.49k stars 24.89k forks source link

[ES|QL] date_parse seems to ignore timezone #117680

Open kiju98 opened 3 days ago

kiju98 commented 3 days ago

Elasticsearch Version

8.16.0

Installed Plugins

No response

Java Version

bundled

OS Version

Linux amd64 5.15.0-1032-gcp

Problem Description

POST /_query
{
    "query": """
row message = "192.168.1.199 - - [12/Jul/2022:10:24:10 +0900] \"GET /cgi-bin/try/ HTTP/1.0\" 200 3005"
| grok message "%{COMMONAPACHELOG}"
| keep timestamp
| eval @timestamp = date_parse("dd/MMM/yyyy:HH:mm:ss Z", timestamp)
"""
}

produces

{
  "took": 15,
  "columns": [
    {
      "name": "timestamp",
      "type": "keyword"
    },
    {
      "name": "@timestamp",
      "type": "date"
    }
  ],
  "values": [
    [
      "12/Jul/2022:10:24:10 +0900",
      "2022-07-12T10:24:10.000Z"
    ]
  ]
}

but

POST /_ingest/pipeline/_simulate
{
  "pipeline": {
    "processors": [
      {
        "date": {
          "field": "timestamp",
          "formats": [
            "dd/MMM/yyyy:HH:mm:ss Z"
          ]
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "id",
      "_source": {
        "timestamp": "12/Jul/2022:10:24:10 +0900"
      }
    }
  ]
}

produces

{
  "docs": [
    {
      "doc": {
        "_index": "index",
        "_version": "-3",
        "_id": "id",
        "_source": {
          "timestamp": "12/Jul/2022:10:24:10 +0900",
          "@timestamp": "2022-07-12T01:24:10.000Z"
        },
        "_ingest": {
          "timestamp": "2024-11-28T07:11:48.58392818Z"
        }
      }
    }
  ]
}

I suspect date_parse in ES|QL ignores the timezone offset. Will you look into it, please?

Steps to Reproduce

You can run

POST /_query
{
    "query": """
row message = "192.168.1.199 - - [12/Jul/2022:10:24:10 +0900] \"GET /cgi-bin/try/ HTTP/1.0\" 200 3005"
| grok message "%{COMMONAPACHELOG}"
| keep timestamp
| eval @timestamp = date_parse("dd/MMM/yyyy:HH:mm:ss Z", timestamp)
"""
}

, which produces

{
  "took": 15,
  "columns": [
    {
      "name": "timestamp",
      "type": "keyword"
    },
    {
      "name": "@timestamp",
      "type": "date"
    }
  ],
  "values": [
    [
      "12/Jul/2022:10:24:10 +0900",
      "2022-07-12T10:24:10.000Z"
    ]
  ]
}

I think @timestamp should be "2022-07-12T01:24:10.000Z".

Logs (if relevant)

No response

elasticsearchmachine commented 2 days ago

Pinging @elastic/es-analytical-engine (Team:Analytics)

iverase commented 1 day ago

Per documentation, ES|QL only supports UTC at the moment (e.g it does not support timezones) : https://www.elastic.co/guide/en/elasticsearch/reference/current/esql-limitations.html#esql-limitations-timezone

kiju98 commented 1 day ago

Thank you for the answer. That's sad though :( I hope the timezone support will be added soon.

bpintea commented 1 day ago

Not sure if the lack of support for UTC might be it. I think parsing should read the timezone as is and then only output in UTC, b/c we support no other TZs. I'll have a look.