elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
68.7k stars 24.4k forks source link

Boolean scripted field agg's are coming back as strings #20941

Closed stacey-gammon closed 7 years ago

stacey-gammon commented 7 years ago

This is showing up as a 5.0 blocker for kibana in https://github.com/elastic/kibana/issues/8677

I am requesting a scripted, painless field aggregation with value_type boolean, but it is being returned as a string. value_type double works as expected.

Request:

POST logstash-0/_search
{  
   "size":1,
   "query":{
   },
   "aggs":{  
      "2":{  
        "terms": {
          "script": {
            "inline" : "true",
            "lang": "painless"
          },
          "value_type": "boolean"
        }
      }
   },
   "stored_fields":[  
      "*"
   ],
   "_source":true,
   "script_fields":{  
      "inverse bytes":{  
         "script":{  
            "inline":"true",
            "lang":"painless"
         }
      }
   },
   "docvalue_fields":[  
      "utc_time",
      "relatedContent.article:modified_time",
      "@timestamp",
      "relatedContent.article:published_time"
   ]
}

Response:

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "hits": {
    "total": 14005,
    "max_score": 1,
    "hits": [
      {
        "_index": "logstash-0",
        "_type": "apache",
        "_id": "AVe1doKCJGOhgatAr1_j",
        "_score": 1,
        "_source": {
          "index": "logstash-0",
          "@timestamp": "2016-10-10T17:32:10.221Z",
          "ip": "42.251.131.74",
          "extension": "jpg",
          "response": "200",
          "geo": {
            "coordinates": {
              "lat": 36.74125,
              "lon": -108.2299444
            },
            "src": "IN",
            "dest": "ID",
            "srcdest": "IN:ID"
          },
          "@tags": [
            "success",
            "security"
          ],
          "utc_time": "2016-10-10T17:32:10.221Z",
          "referer": "http://twitter.com/success/umberto-guidoni",
          "agent": "Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24",
          "clientip": "42.251.131.74",
          "bytes": 1973,
          "host": "media-for-the-masses.theacademyofperformingartsandscience.org",
          "request": "/uploads/donald-deke-slayton.jpg",
          "url": "https://media-for-the-masses.theacademyofperformingartsandscience.org/uploads/donald-deke-slayton.jpg",
          "@message": "42.251.131.74 - - [2016-10-10T17:32:10.221Z] \"GET /uploads/donald-deke-slayton.jpg HTTP/1.1\" 200 1973 \"-\" \"Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Chrome/11.0.696.50 Safari/534.24\"",
          "spaces": "this   is   a   thing    with lots of     spaces       wwwwoooooo",
          "xss": "<script>console.log(\"xss\")</script>",
          "headings": [
            "<h3>leonid-popov</h5>",
            "http://www.slate.com/success/james-dutton"
          ],
          "links": [
            "brian-duffy@twitter.com",
            "http://www.slate.com/security/shannon-walker",
            "www.twitter.com"
          ],
          "relatedContent": [
            {
              "url": "http://www.laweekly.com/news/nomadaz-street-art-from-spain-and-italy-at-scion-2368037",
              "og:type": "article",
              "og:title": "Nomadaz: Street Art from Spain and Italy at Scion",
              "og:description": "Pablo Aravena curated the new show at Scion Space in Culver City. Nomadaz brings together eight street artists from Italy and Spain, including Microbo, ...",
              "og:url": "http://www.laweekly.com/news/nomadaz-street-art-from-spain-and-italy-at-scion-2368037",
              "article:published_time": "2008-03-11T13:09:36-07:00",
              "article:modified_time": "2014-10-28T14:59:53-07:00",
              "article:section": "News",
              "og:image": "http://IMAGES1.laweekly.com/imager/nomadaz-street-art-from-spain-and-italy-a/u/original/2430333/img_5631.jpg",
              "og:image:height": "640",
              "og:image:width": "480",
              "og:site_name": "LA Weekly",
              "twitter:title": "Nomadaz: Street Art from Spain and Italy at Scion",
              "twitter:description": "Pablo Aravena curated the new show at Scion Space in Culver City. Nomadaz brings together eight street artists from Italy and Spain, including Microbo, ...",
              "twitter:card": "summary",
              "twitter:image": "http://IMAGES1.laweekly.com/imager/nomadaz-street-art-from-spain-and-italy-a/u/original/2430333/img_5631.jpg",
              "twitter:site": "@laweekly"
            },
            {
              "url": "http://www.laweekly.com/music/the-police-at-the-hollywood-bowl-5-27-once-more-gentlemen-please-but-this-time-with-bad-feeling-2412476",
              "og:type": "article",
              "og:title": "The Police at the Hollywood Bowl 5/27: Once More Gentlemen Please, But This Time With Bad Feeling",
              "og:description": "The Police at Hollywood Bowl, May 27, 2008 By Paul Rogers Sting&rsquo;s ego alone could fill the Hollywood Bowl, but on Tuesday night his band&rsquo;s p...",
              "og:url": "http://www.laweekly.com/music/the-police-at-the-hollywood-bowl-5-27-once-more-gentlemen-please-but-this-time-with-bad-feeling-2412476",
              "article:published_time": "2008-05-28T08:26:44-07:00",
              "article:modified_time": "2014-11-27T07:38:12-08:00",
              "article:section": "Music",
              "article:tag": "Concerts and Tours",
              "og:site_name": "LA Weekly",
              "twitter:title": "The Police at the Hollywood Bowl 5/27: Once More Gentlemen Please, But This Time With Bad Feeling",
              "twitter:description": "The Police at Hollywood Bowl, May 27, 2008 By Paul Rogers Sting&rsquo;s ego alone could fill the Hollywood Bowl, but on Tuesday night his band&rsquo;s p...",
              "twitter:card": "summary",
              "twitter:site": "@laweekly"
            },
            {
              "url": "http://www.laweekly.com/arts/linda-immediato-2371473",
              "og:type": "article",
              "og:title": "Linda Immediato",
              "og:description": "Linda Immediato recently lost her blogging virginity and is making up for lost time. She&#039;s a born and bred New Yorker who was an on-air news reporter fo...",
              "og:url": "http://www.laweekly.com/arts/linda-immediato-2371473",
              "article:published_time": "2005-10-14T12:10:19-07:00",
              "article:modified_time": "2014-11-25T17:14:16-08:00",
              "article:section": "Arts",
              "og:image": "http://images1.laweekly.com/imager/linda-immediato/u/original/2435953/dscn0901_1_2.jpg",
              "og:image:height": "211",
              "og:image:width": "200",
              "og:site_name": "LA Weekly",
              "twitter:title": "Linda Immediato",
              "twitter:description": "Linda Immediato recently lost her blogging virginity and is making up for lost time. She&#039;s a born and bred New Yorker who was an on-air news reporter fo...",
              "twitter:card": "summary",
              "twitter:image": "http://images1.laweekly.com/imager/linda-immediato/u/original/2435953/dscn0901_1_2.jpg",
              "twitter:site": "@laweekly"
            },
            {
              "url": "http://www.laweekly.com/arts/no-no-no-2372392",
              "og:type": "article",
              "og:title": "No No No...",
              "og:description": "Maybe it was payback from the hater gods for being too happy and too much in love on Valentines Day, but the romantic dinner I&nbsp; bragged about havin...",
              "og:url": "http://www.laweekly.com/arts/no-no-no-2372392",
              "article:published_time": "2006-02-18T19:02:20-08:00",
              "article:modified_time": "2014-11-25T20:09:29-08:00",
              "article:section": "Arts",
              "article:tag": "Eating Out",
              "og:image": "http://images1.laweekly.com/imager/no-no-no/u/original/2438935/v245146.jpg",
              "og:image:height": "404",
              "og:image:width": "300",
              "og:site_name": "LA Weekly",
              "twitter:title": "No No No...",
              "twitter:description": "Maybe it was payback from the hater gods for being too happy and too much in love on Valentines Day, but the romantic dinner I&nbsp; bragged about havin...",
              "twitter:card": "summary",
              "twitter:image": "http://images1.laweekly.com/imager/no-no-no/u/original/2438935/v245146.jpg",
              "twitter:site": "@laweekly"
            }
          ],
          "machine": {
            "ram": 12884901888
          }
        },
        "fields": {
          "relatedContent.article:modified_time": [
            1414533593000,
            1416964456000,
            1416974969000,
            1417102692000
          ],
          "@timestamp": [
            1476120730221
          ],
          "utc_time": [
            1476120730221
          ],
          "relatedContent.article:published_time": [
            1129317019000,
            1140318140000,
            1205266176000,
            1211988404000
          ],
          "inverse bytes": [
            true
          ]
        }
      }
    ]
  },
  "aggregations": {
    "2": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "true",
          "doc_count": 14005
        }
      ]
    }
  }
}

fwiw, I am capturing the request in slow log and when I pass "value_type": "boolean", that line is getting stripped out. When I pass "value_type: "double" it's being preserved in the logged request. I admit I am not sure if ES is stripping value_type or Kibana is.

jimczi commented 7 years ago

The value_type does not handle boolean. The parser is too lenient and accepts invalid values (values that are not recognized as valid types) without throwing an error. You can try with value_type=foo and you'll have the same output. It's too late for 5.0.0 but we can remove this leniency in 5.x. I'll not remove the blocker label since others may want to comment as well.

stacey-gammon commented 7 years ago

I've discovered a couple additional issues with boolean fields. It appears that using a painless script field of doc['alive'].value == false fails with

 "caused_by": {
            "type": "class_cast_exception",
            "reason": "Cannot cast from: boolean to boolean",
            "caused_by": {
              "type": "wrong_method_type_exception",
              "reason": "cannot convert MethodHandle(long,long)boolean to (Object,boolean)boolean"
            }

Along the same vein, using a script query like the following does not return any results (it has to be 0 or 1 to work as expected).

     "script" : {
              "inline" : "(doc['alive'].value) == params.value",
              "lang" : "painless",
              "params" : {
                "value" : false
              }
            },
jimczi commented 7 years ago

@Stacey-Gammon this is a known issue reported by ... you. The fix should come in 5.1.0 as stated here: https://github.com/elastic/elasticsearch/issues/20949

clintongormley commented 7 years ago

@jimferenczi would be good to document value_type as well.

jpountz commented 7 years ago

Let's check whether we can get rid of it first? If I read the code correctly, this information is not propagated to painless but only used by aggregations.

jpountz commented 7 years ago

Addressed by #22135.

Bargs commented 7 years ago

@jpountz I saw this was closed so I did some testing to see if we could remove our workaround in Kibana and I found that the terms agg no longer accepts boolean scripts. If I use a simplified version of the query Stacey provided above, I get an error:

Request:

{  
   "aggs":{  
      "1":{  
        "terms": {
          "script": {
            "inline" : "return true;",
            "lang": "painless"
          },
          "value_type": "boolean"
        }
      }
   }
}

Response

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 4,
    "failed": 1,
    "failures": [
      {
        "shard": 4,
        "index": "foo",
        "node": "bAWZQ2m9SGutXsY_I_WLfg",
        "reason": {
          "type": "aggregation_execution_exception",
          "reason": "Unsupported script value [true]"
        }
      }
    ]
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  },
  "aggregations": {
    "1": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": []
    }
  }
}

Is this an unintentional side effect of the PR that closed this?

jpountz commented 7 years ago

Let me dig into that, the issue reported an issue with the fact that the boolean value type was not supported, so I closed it since it was addressed, but it looks like there is something else that prevents terms aggs on boolean scripts.

jpountz commented 7 years ago

OK, this is because aggregations expect that scripts return numbers as a representation for boolean values, like boolean fields do, so for instance that query works:

GET index/_search
{  
   "aggs":{  
      "1":{  
        "terms": {
          "script": {
            "inline" : "return 1;",
            "lang": "painless"
          },
          "value_type": "boolean"
        }
      }
   }
}

It should be fixable though.

jpountz commented 7 years ago

I opened a PR https://github.com/elastic/elasticsearch/pull/22201