apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.22k stars 440 forks source link

[VL] get_json_object generate incorrect result #8102

Open zhli1142015 opened 6 days ago

zhli1142015 commented 6 days ago

Backend

VL (Velox)

Bug description

https://github.com/apache/incubator-gluten/pull/8099#issuecomment-2507271371

'$.store.book' offload to gluten
- $.store.book *** FAILED ***
  Incorrect evaluation: get_json_object(
  {"store":{"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
  "basket":[[1,2,{"b":"y","a":"x"}],[3,4],[5,6]],"book":[{"author":"Nigel Rees",
  "title":"Sayings of the Century","category":"reference","price":8.95},
  {"author":"Herman Melville","title":"Moby Dick","category":"fiction","price":8.99,
  "isbn":"0-553-21311-3"},{"author":"J. R. R. Tolkien","title":"The Lord of the Rings",
  "category":"fiction","reader":[{"age":25,"name":"bob"},{"age":26,"name":"jack"}],
  "price":22.99,"isbn":"0-395-19395-8"}],"bicycle":{"price":19.95,"color":"red"}},
  "email":"amy@only_for_json_udf_test.net","owner":"amy","zip code":"94025",
  "fb:testid":"1234"}
  , $.store.book), actual: [{"author":"Nigel Rees",
  "title":"Sayings of the Century","category":"reference","price":8.95},
  {"author":"Herman Melville","title":"Moby Dick","category":"fiction","price":8.99,
  "isbn":"0-553-21311-3"},{"author":"J. R. R. Tolkien","title":"The Lord of the Rings",
  "category":"fiction","reader":[{"age":25,"name":"bob"},{"age":26,"name":"jack"}],
  "price":22.99,"isbn":"0-395-19395-8"}], expected: [{"author":"Nigel Rees","title":"Sayings of the Century","category":"reference","price":8.95},{"author":"Herman Melville","title":"Moby Dick","category":"fiction","price":8.99,"isbn":"0-553-21311-3"},{"author":"J. R. R. Tolkien","title":"The Lord of the Rings","category":"fiction","reader":[{"age":25,"name":"bob"},{"age":26,"name":"jack"}],"price":22.99,"isbn":"0-395-19395-8"}] (GlutenTestsTrait.scala:286)
'$.store.book[0]' offload to gluten
- $.store.book[0] *** FAILED ***
  Incorrect evaluation: get_json_object(
  {"store":{"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
  "basket":[[1,2,{"b":"y","a":"x"}],[3,4],[5,6]],"book":[{"author":"Nigel Rees",
  "title":"Sayings of the Century","category":"reference","price":8.95},
  {"author":"Herman Melville","title":"Moby Dick","category":"fiction","price":8.99,
  "isbn":"0-553-21311-3"},{"author":"J. R. R. Tolkien","title":"The Lord of the Rings",
  "category":"fiction","reader":[{"age":25,"name":"bob"},{"age":26,"name":"jack"}],
  "price":22.99,"isbn":"0-395-19395-8"}],"bicycle":{"price":19.95,"color":"red"}},
  "email":"amy@only_for_json_udf_test.net","owner":"amy","zip code":"94025",
  "fb:testid":"1234"}
  , $.store.book[0]), actual: {"author":"Nigel Rees",
  "title":"Sayings of the Century","category":"reference","price":8.95}, expected: {"author":"Nigel Rees","title":"Sayings of the Century","category":"reference","price":8.95} (GlutenTestsTrait.scala:286)

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

PHILO-HE commented 3 days ago

@zhli1142015, the difference is Gluten's result contains a newline character, but vanilla Spark doesn't. Right?

zhli1142015 commented 3 days ago

Looks so. I think the difference is acceptable. But we need rewrite spark UTs to let them pass.

PHILO-HE commented 3 days ago

@zhli1142015, I will update our implementation and revise Spark UT that Gluten produces acceptable different results.