Open wForget opened 3 weeks ago
cc: @PHILO-HE
@wForget, it's strange. I just applied the below patch to test your case on Velox side (1.2.0 velox branch), the test passed.
diff --git a/velox/functions/sparksql/tests/JsonFunctionsTest.cpp b/velox/functions/sparksql/tests/JsonFunctionsTest.cpp
index c0c8ecc90..f9448733a 100644
--- a/velox/functions/sparksql/tests/JsonFunctionsTest.cpp
+++ b/velox/functions/sparksql/tests/JsonFunctionsTest.cpp
@@ -119,5 +119,9 @@ TEST_F(GetJsonObjectTest, nullResult) {
std::nullopt);
}
+TEST_F(GetJsonObjectTest, escaped) {
+ EXPECT_EQ(getJsonObject(R"({"c1":"test\ntest"})", "$.c1"), "test\ntest");
+}
+
} // namespace
} // namespace facebook::velox::functions::sparksql::test
R"({"c1":"test\ntest"})"
Does this mean that \n
is not escaped?
@wForget, no, it's escaped. Just verified by printing getJsonObject(R"({"c1":"test\ntest"})", "$.c1")
@wForget, no, it's escaped. Just verified by printing
getJsonObject(R"({"c1":"test\ntest"})", "$.c1")
Could you try:
const std::string json= R"(
{
"c1":"test
test"
}
)";
getJsonObject(json, "$.c1")
I guess this may be due to spark using some non-standard json parsing behavior.
It seems that SINGLE QUOTES
is also not allowed.
select get_json_object('{\'c1\':\'test test\'}', '$.c1');
gluten disabled:
+--------------------------------------------+--+
| get_json_object({'c1':'test test'}, $.c1) |
+--------------------------------------------+--+
| test test |
+--------------------------------------------+--+
gluten enabled:
+--------------------------------------------+--+
| get_json_object({'c1':'test test'}, $.c1) |
+--------------------------------------------+--+
| NULL |
+--------------------------------------------+--+
@wForget, it's a known incompatibility issue in using single quotes. See doc link.
As far as I know, using single quote to enclose JSON content is not allowed in JSON standard. Not sure why Spark allows using it to replace double quote. We have no plan to support it.
@wForget, it's strange. I just applied the below patch to test your case on Velox side (1.2.0 velox branch), the test passed.
diff --git a/velox/functions/sparksql/tests/JsonFunctionsTest.cpp b/velox/functions/sparksql/tests/JsonFunctionsTest.cpp index c0c8ecc90..f9448733a 100644 --- a/velox/functions/sparksql/tests/JsonFunctionsTest.cpp +++ b/velox/functions/sparksql/tests/JsonFunctionsTest.cpp @@ -119,5 +119,9 @@ TEST_F(GetJsonObjectTest, nullResult) { std::nullopt); } +TEST_F(GetJsonObjectTest, escaped) { + EXPECT_EQ(getJsonObject(R"({"c1":"test\ntest"})", "$.c1"), "test\ntest"); +} + } // namespace } // namespace facebook::velox::functions::sparksql::test
Using regular string instead of raw string can reproduce this issue. It also occurs on the main branch. I found Presto also allows control characters, like Spark. We may have to change simdjson's code to fix this issue. But not sure whether it is acceptable. See Velox PR: https://github.com/facebookincubator/velox/pull/11433
Backend
VL (Velox)
Bug description
sql:
result of gluten 1.2.0 with velox:
result of valilla spark:
Spark version
None
Spark configurations
No response
System information
No response
Relevant logs
No response