Veraticus / Dynamoid

Ruby ORM for Amazon's DynamoDB
http://joshsymonds.com/Dynamoid/
247 stars 83 forks source link

Duplicate entries in DynamoDB when updating an existing object #162

Open wagaboy opened 10 years ago

wagaboy commented 10 years ago

I have a model (shown below) that gets duplicated (observed from dynamodb console) when I update it. My sequence of operations, starting with an empty table:

User.create(...) # creates one item in the table
User.count # =1
u = User.first
u.first_name = "Bar"
u.save # Now there are two items in the table
User.count # = 2

Just calling u.save multiple times on the same object creates multiple entries. Am I missing something-- I'm fairly new to dynamodb and dynamoid. Or is this a bug or a known issue.

My Model

class User
  include Dynamoid::Document

  field :email
  field :provider
  field :uid

  # Do not allow modification of first and last name.
  field :first_name
  field :last_name
  field :bio
  field :roles, :set

  index [:uid, :provider]
  ...
wagaboy commented 10 years ago

Details about my setup: aws-sdk (1.24.0) dynamoid (0.7.1) rails (3.2.14) ruby 1.9.3p392

jasoncox commented 10 years ago

I've noticed this too, no duplicates are created if you set the key to something else like:

table :key => :user_id

Also on the duplicates the id hash remains the same but appended with a changing number .123, is this some kind of versioning - Any ideas on this?

jasoncox commented 10 years ago

This is a feature of Dynamoid - see Partitioning :+1:

From Readme: Dynamoid attempts to obviate this problem transparently by employing a partitioning strategy to divide up keys randomly across DynamoDB's servers. Each ID is assigned an additional number (by default 0 to 199, but you can increase the partition size in Dynamoid's configuration) upon save; when read, all 200 hashes are retrieved simultaneously and the most recently updated one is returned to the application. This results in a significant net performance increase, and is usually invisible to the application itself. It does, however, bring up the important issue of provisioning your DynamoDB tables correctly.

With partitioning enabled I suppose .count is not returning the expected result?

ngordon17 commented 9 years ago

@jasoncox the issue has nothing to do with partitioning.

@wagaboy you are getting this behavior because you are saving the field and Dynamoid is automatically filling in the 'id' field with a randomly generated string since you haven't specified what the table key should be (which is why @jasoncox's solution works). However, when you do User.find, it is querying using the index you used and thus is only pulling in the fields from the index and not the saved 'id'. Now you are resaving the object, but the 'id' field is blank and so Dynamoid is generating another random id and resaving the object with that new 'id' resulting in what appear to be two separate objects since User.count queries the table with the primary key not the table with your index. Note that the index table will actually only have one user. This should probably be fixed so that when you query on an index it also loads the primary key, but in the meantime I would suggest specifying your own primary key so that you don't run into the issue or make sure to load the id field in before saving.