Closed saurabhjaluka closed 7 years ago
I see the problem and agree with your analysis. One problem is that _parent
is not configurable in 6.0 any more which is (probably) the first version that'd get this fix. For indexes created after 6.0
_parent
has been replaced by join
fields which don't have this problem because they require explicitly setting the routing everywhere. So you aren't super likely to be able to use this fix by the time it is ready.
I wonder if a work around is good enough. Something like always change the routing
in a consistent way so that the condition triggers. It isn't clean, but it'd work. Another option is to manually perform this reindex and/or do it with one of the reindex helpers like the one in the python or perl client. They likely don't have this issue.
Thanks @nik9000 . Yeah, I might go for an approach for reindexing using logstash for now.
Workaround incase anyone needs it:
migration.sh
pathToLogstash="<path-to-logstash>"
sourceHost="localhost:9200"
targetHost="localhost:9200"
sourceIndex="source-index"
targetIndex="dest-index"
input="input { elasticsearch { hosts => [\"${sourceHost}\"] index => \"$sourceIndex\" size => 5000 scroll => \"5m\" docinfo => true } }"
filter="filter { json { source => \"message\" } mutate { remove_field => [ \"@version\" ] remove_field => [\"@timestamp\"] remove_field => [ \"_parent\" ]} }"
output="output { elasticsearch { index => \"$targetIndex\" hosts => [\"${targetHost}\"] document_type => \"%{[@metadata][_type]}\" document_id => \"%{[@metadata][_id]}\" routing => \"%{engineKey}\" manage_template => false } }"
${pathToLogstash} -e "${input} ${filter} ${output}"
Thanks @nik9000 . Yeah, I might go for an approach for reindexing using logstash for now.
Thanks for understanding!
I'm going to close this issue as "wontfix". Sorry!
Why didn't we fix this? I am having issues with this today
@nik9000 I have found a better workaround for this issue that works without having to use a different client or having to modify document ids.
In a painless script, if you want to change/remove _parent
but not change _routing
:
ctx._parent = null;
ctx._routing = new StringBuffer(ctx._routing);
@benbenwilde that's an easy solution, I don't know why I did not think about it. Glad to know you found the solution.
@benbenwilde ,Hey, bro, you solved my problem and gave you 100 likes
Elasticsearch version: 5.5.1
Plugins installed: none
JVM version: 1.8
OS version: Ubuntu
Description of the problem including expected versus actual behavior: SourceIndex : has parent field in the documents DestinationIndex : no parent field in the documents
When I try to use reindex api with painless script for migration data from source to destination, setting parent to null in the script. It also sets routing to null. My requirement is just to remove the parent field and keep the routing field in destination.
Expected behavior: Routing should not be set to null, just parent should be set to null
Steps to reproduce:
Source index:
Destination index:
Sample Document:
Fetch Document to verify routing and parent field:
Reindex api:
Verify document at destination index:
Parent is not present that is correct, but routing is not present as well.
Helpful information:
I debugged ES code and found out in file https://github.com/elastic/elasticsearch/blob/v5.5.1/modules/reindex/src/main/java/org/elasticsearch/index/reindex/AbstractAsyncBulkByScrollAction.java under apply function. When parent is set to newValue the function scriptChangedParent (func def : https://github.com/elastic/elasticsearch/blob/v5.5.1/modules/reindex/src/main/java/org/elasticsearch/index/reindex/TransportReindexAction.java) sets routing as well to the newValue of parent.
Next if call for routing, leaves routing field as it is. As newValue = oldValue (in my case it is 12345). But routing is already set the null in the previous step.
Let me know if extra info is required. Also I would love to contribute to fix this.