Open danlevy1 opened 6 years ago
I am running across a few test cases where the following error is not making sense to me.
If I understand this correctly, it is the lack of error in the second case that doesn't make sense to you, correct ? Dots are allowed in field names since 2.4, why would you think the second example should throw an error ?
@jkakavas I am basing this off of https://github.com/elastic/elasticsearch/issues/15951, where it states in the first line, "we had to reject field names containing dots."
@danlevy1 You are correct, this is a bug. The problem is in DocumentParser.splitAndValidatePath
. Here we call String.split
, and assume that if any element is an empty string, then there was a double dot. However, it does not actually work for trailing dot, due to oddities in the implementation of String.split
. From the javadocs there:
Trailing empty strings are therefore not included in the resulting array.
So the algorithm in splitAndValidatePath
needs to be amended. Additionally, it seems we are not actually calling that method with the full path all the time, as your example error only contained .foo
when it should have been .foo.foo.bar
.
Pinging @elastic/es-search-aggs
#15951 is 2 years and explicitly discusses options for re enabling support for dots in field names. It was closed when this was introduced in 2.4 as I linked above. I'm going to close this issue for now as this is expected behavior.
Jumped the gun on this one, thanks @rjernst for chiming in.
@rjernst I will take look more into this. I was actually fixing issue https://github.com/elastic/elasticsearch/issues/21862 when I came across this one. I will update this as soon as I take a look at the splitAndValidatePath
method. I will hold off on posting the fix on issue https://github.com/elastic/elasticsearch/issues/21862 until I can fix this bug. It's possible the two are intertwined.
@rjernst Just to confirm, the second example I provided in the bug report should throw the same error that the first example throws. Is this correct?
Yes, that is correct.
cc @elastic/es-search-aggs
@rjernst I have been playing around with different test cases and the method splitAndValidatePath
seems to take in the string in parts. In other words, it never takes in the full field path (.foo.foo.bar
) in my first example. Do you have a test case where splitAndValidatePath
is called with the full field path? It seems to me like this is more of a systemic problem.
At this point, I can treat the splitAndValidatePath
method as a method that only takes in one part at a time if that's what you feel is best. Do you have any thoughts?
splitAndValidatePath
takes in each element within the json structure. For example, consider following json document:
{
"foo.bar" : {
"baz" : 1
}
}
In this example, we would call splitAndValidatePath("foo.bar")
, followed by splitAndValidatePath("baz")
. The first call would produce 2 elements, which would cause us to lookup (and possibly dynamically create as an object field) a field in the mappings called foo
. The intent of the method is to split these field names in json that may contain path separators (ie dot), so that we can mimic the above as if it was specified as:
{
"foo" : {
"bar" : {
"baz" : 1
}
}
}
After running ./gradlew check on my fixed code, I am getting two errors.
org.elasticsearch.index.mapper.DocumentParserTests.testDynamicFieldsEmptyName
org.elasticsearch.index.mapper.DocumentParserTests.testDynamicFieldsStartingAndEndingWithDot
For the first error, I ran these two tests to see why the error is occurring:
Test 1 Input:
curl -XPUT 'localhost:9200/test/test/1?pretty' -H 'Content-Type: application/json' -d' {
" " : {
"bar" : {
"baz" : 123
}
}
}
'
Test 1 Output:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "object field cannot contain only whitespace: [' .bar']"
}
],
"type" : "illegal_argument_exception",
"reason" : "object field cannot contain only whitespace: [' .bar']"
},
"status" : 400
}
Test 2 Input:
curl -XPUT 'localhost:9200/test/test/1?pretty' -H 'Content-Type: application/json' -d' {
"foo" : {
"bar" : {
" " : 123
}
}
}
'
Test 2 Output:
{
"error" : {
"root_cause" : [
{
"type" : "illegal_argument_exception",
"reason" : "object field cannot contain only whitespace: ['foo.bar. ']"
}
],
"type" : "illegal_argument_exception",
"reason" : "object field cannot contain only whitespace: ['foo.bar. ']"
},
"status" : 400
}
When I looked at DocumentParserTests.testDynamicFieldsEmptyName
the error looks the same as mine. I am not sure why this test is failing.
For the second error, I believe the test is failing because my fix changes the way [.]'s are handled. Here are four test cases:
Test 1 Input:
curl -XPUT 'localhost:9200/test/test/1?pretty' -H 'Content-Type: application/json' -d' {
".foo" : {
"bar" : {
"baz" : 123
}
}
}
'
Test 1 Output:
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "failed to parse"
}
],
"type" : "mapper_parsing_exception",
"reason" : "failed to parse",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "object field starting or ending with a [.] makes object resolution ambiguous: ['.foo']"
}
},
"status" : 400
}
Test 2 Input:
curl -XPUT 'localhost:9200/test/test/1?pretty' -H 'Content-Type: application/json' -d' {
"foo." : {
"bar" : {
"baz" : 123
}
}
}
'
Test 2 Output:
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "failed to parse"
}
],
"type" : "mapper_parsing_exception",
"reason" : "failed to parse",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "object field starting or ending with a [.] makes object resolution ambiguous: ['foo.']"
}
},
"status" : 400
}
Test 3 Input:
curl -XPUT 'localhost:9200/test/test/1?pretty' -H 'Content-Type: application/json' -d' {
"fooOne..fooTwo" : {
"bar" : {
"baz" : 123
}
}
}
'
Test 3 Output:
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "failed to parse"
}
],
"type" : "mapper_parsing_exception",
"reason" : "failed to parse",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "object field starting or ending with a [.] makes object resolution ambiguous: [fooOne..fooTwo]"
}
},
"status" : 400
}
Test 4 Input:
curl -XPUT 'localhost:9200/test/test/1?pretty' -H 'Content-Type: application/json' -d' {
"fooOne.fooTwo." : {
"bar" : {
"baz" : 123
}
}
}
'
Test 4 Output:
{
"error" : {
"root_cause" : [
{
"type" : "mapper_parsing_exception",
"reason" : "failed to parse"
}
],
"type" : "mapper_parsing_exception",
"reason" : "failed to parse",
"caused_by" : {
"type" : "illegal_argument_exception",
"reason" : "object field starting or ending with a [.] makes object resolution ambiguous: ['fooOne.fooTwo.']"
}
},
"status" : 400
}
When I looked at DocumentParserTests.testDynamicFieldsStartingAndEndingWithDot
the error appears to create two [.]'s ('top..foo..bar
'). My fix doesn't do this and I don't think it should be doing that.
Can an Elastic team member please advise me on how to move forward? I don't want to submit a PR until all of these tests pass. @rjernst
@danlevy1 I'm sorry that it has taken us so long to get back to you. Are you still interested in working on a fix for this issue (and #21862)?
If so, it would be great to open a PR with your current approach. You can make a note on the PR that certain tests are failing, and we can discuss how to debug them there. If you'd like, you can open a 'Draft' PR to make it clear it's a work in progress.
@danlevy1 another friendly ping to see if you'd like to pick this up, otherwise I'll plan to submit a PR. As a heads up, I opened #49946 to address the problem for direct 'put mapping' calls.
Pinging @elastic/es-search (Team:Search)
Pinging @elastic/es-search-foundations (Team:Search Foundations)
Elasticsearch version: 6.2.1
Plugins installed: None
JVM version: 1.8.0_131
OS version: MacOS 10.13
I am running across a few test cases where the following error is not making sense to me.
The first example includes a [.] in front of
foo
. In this case, there is an error that states that there cannot be a [.] at the beginning of the object field:Input:
Output (Error):
This makes sense according to the error.
The second example includes a [.] after
foo
. In this case, there is no error.Input:
Output (Success):
Get (The [.] is included in the field name):
I was looking at the following GitHub / Elasticsearch Discuss Posts 1) https://discuss.elastic.co/t/name-cannot-be-empty-string-error-if-field-name-starts-with/66808 2) https://github.com/elastic/elasticsearch/issues/15951
Based on these, it seems like the second example should be throwing the same error as the first example.