elastic / elasticsearch-rails

Elasticsearch integrations for ActiveModel/Record and Ruby on Rails
Apache License 2.0
3.07k stars 797 forks source link

update_document not selective enough #1061

Open PussInDasBoot opened 1 year ago

PussInDasBoot commented 1 year ago

Hi, I think there is a bug in the update_document code. When changes to fields are saved in the @__changed_model_attributes attribute in the before_save callback, these are never cleared and can just grow until they contain all fields and the desired effect of using update_document is lost as it will no longer be performing partial updates.

A replication of the bug:

pry(main)> record.__elasticsearch__.instance_variable_set(:@__changed_model_attributes, nil)
=> nil
pry(main)> record.__elasticsearch__.instance_variable_get(:@__changed_model_attributes)
=> nil
pry(main)> record.__elasticsearch__.index_document
   PUT http://search:9200/index/_doc/14339 [status:200, request:0.173s, query:n/a]
   > {"field_one":"some-string","title":"Old title","description":"Old description","created_at":"2023-09-08T15:38:53.000Z","updated_at":"2023-09-08T16:05:14.000Z"}
   < {"_index":"index","_type":"_doc","_id":"14339","_version":7,"result":"updated","_shards":{"total":6,"successful":1,"failed":0},"_seq_no":821,"_primary_term":6}
curl -X PUT 'http://localhost:9200/index/_doc/14339?pretty' -d '{"field_one":"some-string","title":"Old title","description":"Old description","created_at":"2023-09-08T15:38:53.000Z","updated_at":"2023-09-08T16:05:14.000Z"}'

# 2023-09-08T16:05:29+00:00 [200] (0.173s)
#
# {"_index":"index","_type":"_doc","_id":"14339","_version":7,"result":"updated","_shards":{"total":6,"successful":1,"failed":0},"_seq_no":821,"_primary_term":6
# }

=> {"_index"=>"index", "_type"=>"_doc", "_id"=>"14339", "_version"=>7, "result"=>"updated", "_shards"=>{"total"=>6, "successful"=>1, "failed"=>0}, "_seq_no"=>821, "_primary_term"=>6}
pry(main)> record.update(title: "New title")
  SQL
=> true
pry(main)> record.__elasticsearch__.instance_variable_get(:@__changed_model_attributes)
=> {"title"=>"New title"}
pry(main)> record.__elasticsearch__.update_document
2023-09-08 16:05:47 +0000: POST http://search:9200/index/_doc/14339/_update [status:200, request:0.169s, query:n/a]
2023-09-08 16:05:47 +0000: > {"doc":{"title":"New title"}}
2023-09-08 16:05:47 +0000: < {"_index":"index","_type":"_doc","_id":"14339","_version":8,"result":"updated","_shards":{"total":6,"successful":1,"failed":0},"_seq_no":822,"_primary_term":6}
curl -X POST 'http://localhost:9200/index/_doc/14339/_update?pretty' -d '{"doc":{"title":"New title"}}'

# 2023-09-08T16:05:47+00:00 [200] (0.169s)
#
# {"_index":"index","_type":"_doc","_id":"14339","_version":8,"result":"updated","_shards":{"total":6,"successful":1,"failed":0},"_seq_no":822,"_primary_term":6
# }
=> {"_index"=>"index", "_type"=>"_doc", "_id"=>"14339", "_version"=>8, "result"=>"updated", "_shards"=>{"total"=>6, "successful"=>1, "failed"=>0}, "_seq_no"=>822, "_primary_term"=>6}

This is all good, desired behaviour. The title is updated and only that field is indexed. However when we then go on to update more fields we find this:

pry(main)> record.update(description: "New description")
  SQL
=> true
pry(main)> record.__elasticsearch__.instance_variable_get(:@__changed_model_attributes)
=> {"title"=>"New title", "description"=>"New description"}
pry(main)> record.__elasticsearch__.update_document
2023-09-08 16:06:12 +0000: POST http://search:9200/index/_doc/14339/_update [status:200, request:0.079s, query:n/a]
2023-09-08 16:06:12 +0000: > {"doc":{"title":"New title","description":"New description"}}
2023-09-08 16:06:12 +0000: < {"_index":"index","_type":"_doc","_id":"14339","_version":9,"result":"updated","_shards":{"total":6,"successful":1,"failed":0},"_seq_no":823,"_primary_term":6}
curl -X POST 'http://localhost:9200/index/_doc/14339/_update?pretty' -d '{"doc":{"title":"New title","description":"New description"}}'

# 2023-09-08T16:06:12+00:00 [200] (0.079s)
#
# {"_index":"index","_type":"_doc","_id":"14339","_version":9,"result":"updated","_shards":{"total":6,"successful":1,"failed":0},"_seq_no":823,"_primary_term":6
# }

=> {"_index"=>"index", "_type"=>"_doc", "_id"=>"14339", "_version"=>9, "result"=>"updated", "_shards"=>{"total"=>6, "successful"=>1, "failed"=>0}, "_seq_no"=>823, "_primary_term"=>6}

We can see this retains the title field which has already been updated in the index. Should there be a way of clearing the @__changed_model_attributes attribute whenever documents are reindexed in any way?