elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.07k stars 24.84k forks source link

Invalid JSON Being Returned #18076

Closed shawn-digitalpoint closed 8 years ago

shawn-digitalpoint commented 8 years ago

Elasticsearch version: 2.3.2

JVM version: 1.8.0_60 OpenJDK 64-Bit Server VM 25.60-b23 Oracle Corporation

OS version: OpenSUSE Leap 42.1

Description of the problem including expected versus actual behavior: Invalid JSON returned in Elasticsearch results. Expect results from Elasticsearch to be valid JSON.

Steps to reproduce: We have an aggregation that includes longitude, latitude and location name separated by a pipe. ES isn't escaping the quotes within one of them. ES returns this as results:

{"took":59,"aggregations":{"time_group":{"buckets":[{"key":78469,"d":{"buckets":[{"d":{"buckets":[{"key":"34.774663|-112.467699|Chino Valley Fire Department"}]}}]},"w":{"value":1.0}},{"key":78388,"d":{"buckets":[{"d":{"buckets":[{"key":"34.775424|-112.452459|Windmill 7"}]}}]},"w":{"value":1.0}},{"key":78333,"d":{"buckets":[{"d":{"buckets":[{"key":"34.773574|-112.465953|"19 Remembered" Hotshot Honors"}]}}]},"w":{"value":1.0}},{"key":78309,"d":{"buckets":[{"d":{"buckets":[{"key":"34.773311|-112.46529|Chino Valley Saluted America's Heros"}]}}]},"w":{"value":1.0}},{"key":78305,"d":{"buckets":[{"d":{"buckets":[{"key":"34.77383|-112.465432|Chino Valley Public Library"}]}}]},"w":{"value":1.0}},{"key":78112,"d":{"buckets":[{"d":{"buckets":[{"key":"34.769398|-112.447186|Chino Valley Community Center Park"}]}}]},"w":{"value":1.0}},{"key":77976,"d":{"buckets":[{"d":{"buckets":[{"key":"34.777816|-112.447498|Stoned Security Donkey"}]}}]},"w":{"value":1.0}},{"key":77830,"d":{"buckets":[{"d":{"buckets":[{"key":"34.772045|-112.427787|Peavine Trails"}]}}]},"w":{"value":1.0}},{"key":77585,"d":{"buckets":[{"d":{"buckets":[{"key":"34.644372|-112.432358|Prescott Municipal Airport"}]}}]},"w":{"value":1.0}},{"key":77371,"d":{"buckets":[{"d":{"buckets":[{"key":"34.760367|-112.447588|Hope Lutheran Church"}]}}]},"w":{"value":1.0}},{"key":56892,"d":{"buckets":[{"d":{"buckets":[{"key":"35.483665|-111.556402|Coconino National Forest"}]}}]},"w":{"value":0.89}},{"key":51987,"d":{"buckets":[{"d":{"buckets":[{"key":"35.313604|-112.852838|Seligman, Arizona"}]}}]},"w":{"value":0.84}}]}}}

Specifically this bucket key isn't escaped:

{"key":"34.773574|-112.465953|"19 Remembered" Hotshot Honors"}

shawn-digitalpoint commented 8 years ago

Found a workaround... if you don't use the filter_path, the results are properly escaped.

jpountz commented 8 years ago

@shawn-digitalpoint then this looks like a bug in how filter_path works. Can you share your request (including the filter_path)?

shawn-digitalpoint commented 8 years ago

Using PHP to build the JSON, but this is the PHP array (not sure why github's code block isn't working, sorry...):

$params = array(
  'size' => 0,
  'sort' => array(
    'date' => array(
      'order' => 'desc'
    )
  ),
  'query' => array(
    'bool' => array(
      'must' => array(
        array(
          'match' => array(
            'attacking_agent_id' => 100,
          ),
        ),
        array(
          'range' => array(
            'date' => array(
              'gte' => 1430606758
            )
          ),
        ),
      )
    )
  ),
  'aggs' => array(
    'time_group' => array(
      'terms' => array(
        'field' => 'location_id',
        'size' => 100000,
        'order'=> array(
          '_term' => 'desc'
        ),
      ),

      'aggs' => array(
        'd' => array(
          'terms' => array(
            'field' => 'timegroup_agent',
            'size' => 1
          ),
          'aggs' => array(
            'd' => array(
              'terms' => array(
                'field' => 'raw_geo_name',
                'size' => 1
              )
            ),
          )
        ),
        'total_date' => array(
          'sum' => array(
            'field' => 'date'
          )
        ),
        'unique_attacks' => array(
          'cardinality' => array(
            "field" => "timegroup_agent"
          )
        ),

        "w" => array (
          'bucket_script' => array(
            'buckets_path' => array(
              'totalDate' => 'total_date',
              'unique' => 'unique_attacks',
              'docCount' => '_count',
            ),
            "script" => 'min(' . $maxWeight . ', max(' . $minWeight . ', round(((' . ($daysBack * 86400) . ' * docCount) - ((' . (XenForo_Application::$time) . ' * docCount) - totalDate)) / docCount / ' . ($daysBack * 86400) . ' * unique * 100) / 100))'  // round(((' . ($daysBack * 86400)  . ' * docCount) - ((' . XenForo_Application::$time . ' * docCount) - totalDate)) / ' . ($daysBack * 86400) . ' * 100) / 100
          )
        ),
      )
    ),
  )
);
shawn-digitalpoint commented 8 years ago

Oh, and the filter_path used when getting invalid JSON is:

took,aggregations.time_group.buckets.key,aggregations.time_group.buckets.d.buckets.d.buckets.key,aggregations.time_group.buckets.w

tlrx commented 8 years ago

Thanks for reporting! It's probably a Jackson bug so I created FasterXML/jackson-core/pull/280 to see and discuss this issue with the Jackson team. I'll update this issue once I have some feedback.

tlrx commented 8 years ago

The Jackson issue has been confirmed and merged. This issue will be resolved once we move on a new release of jackson-core that integrates the fix.

s1monw commented 8 years ago

awesome @tlrx great to fix this kind of stuff upstream directly!